Sequence-to-Sequence

Encoder-Decoder Architecture

38 SLIDES Part 2: Core Architectures

?

The Bottleneck Problem: How do you compress "The quick brown fox jumps over the lazy dog" into a single vector without losing meaning? You can't - and that's exactly why attention was invented. Instead of one summary, attention lets the decoder "look back" at every input word.

Prerequisites

Week 3: LSTM networks and sequential processing
Understanding of encoder-decoder concepts
Matrix multiplication basics

Overview

Map sequences to sequences. Foundation of machine translation and summarization.

Learning Objectives

Explain the encoder-decoder architecture for sequence transformation
Identify the information bottleneck problem in basic seq2seq
Calculate attention scores between encoder and decoder states
Implement soft attention mechanism conceptually
Compare different attention variants (additive, multiplicative)

Key Topics

Encoder-decoder

Attention mechanism

Teacher forcing

Beam search

Key Concepts

EncoderCompresses input sequence into context vector

DecoderGenerates output sequence from context

Information bottleneckFixed-size context loses information

Attention mechanismDynamic weighting of encoder states

Attention scoresRelevance of each encoder position to decoder

Context vectorWeighted sum of encoder states (per decoder step)

Key Visualizations

Week4 Encoder Decoder Flow

Week4 Encoder Decoder Flow

Week4 Attention Heatmap

Week4 Attention Heatmap

Week4 Beam Search Tree

Week4 Beam Search Tree

Week4 Seq2Seq Architecture Minimalist

Week4 Seq2Seq Architecture Minimalist

Resources

View Slides (PDF) [source] Open in Colab Download Notebook Chart Gallery

Moodle Resources (HS25)

Lecture Slides

Seq2Seq Slides

Student Handouts

Seq2Seq Handout (Student)

Previous RNN & LSTM Next Transformers