Sequence-to-Sequence
Encoder-Decoder Architecture
38 SLIDES Part 2: Core Architectures
The Bottleneck Problem: How do you compress "The quick brown fox jumps over the lazy dog" into a single vector without losing meaning? You can't - and that's exactly why attention was invented. Instead of one summary, attention lets the decoder "look back" at every input word.
Prerequisites
- Week 3: LSTM networks and sequential processing
- Understanding of encoder-decoder concepts
- Matrix multiplication basics
Overview
Map sequences to sequences. Foundation of machine translation and summarization.
Learning Objectives
- Explain the encoder-decoder architecture for sequence transformation
- Identify the information bottleneck problem in basic seq2seq
- Calculate attention scores between encoder and decoder states
- Implement soft attention mechanism conceptually
- Compare different attention variants (additive, multiplicative)
Key Topics
Encoder-decoder
Attention mechanism
Teacher forcing
Beam search
Key Concepts
EncoderCompresses input sequence into context vector
DecoderGenerates output sequence from context
Information bottleneckFixed-size context loses information
Attention mechanismDynamic weighting of encoder states
Attention scoresRelevance of each encoder position to decoder
Context vectorWeighted sum of encoder states (per decoder step)
Key Visualizations
Week4 Encoder Decoder Flow
Week4 Attention Heatmap
Week4 Beam Search Tree
Week4 Seq2Seq Architecture Minimalist