Sequence-to-Sequence

Encoder-Decoder Architecture

38 SLIDES Part 2: Core Architectures

?
The Bottleneck Problem: How do you compress "The quick brown fox jumps over the lazy dog" into a single vector without losing meaning? You can't - and that's exactly why attention was invented. Instead of one summary, attention lets the decoder "look back" at every input word.

Prerequisites

  • Week 3: LSTM networks and sequential processing
  • Understanding of encoder-decoder concepts
  • Matrix multiplication basics

Overview

Map sequences to sequences. Foundation of machine translation and summarization.

Learning Objectives

  • Explain the encoder-decoder architecture for sequence transformation
  • Identify the information bottleneck problem in basic seq2seq
  • Calculate attention scores between encoder and decoder states
  • Implement soft attention mechanism conceptually
  • Compare different attention variants (additive, multiplicative)

Key Topics

Encoder-decoder
Attention mechanism
Teacher forcing
Beam search

Key Concepts

EncoderCompresses input sequence into context vector
DecoderGenerates output sequence from context
Information bottleneckFixed-size context loses information
Attention mechanismDynamic weighting of encoder states
Attention scoresRelevance of each encoder position to decoder
Context vectorWeighted sum of encoder states (per decoder step)

Key Visualizations

Week4 Encoder Decoder Flow Week4 Encoder Decoder Flow
Week4 Attention Heatmap Week4 Attention Heatmap
Week4 Beam Search Tree Week4 Beam Search Tree
Week4 Seq2Seq Architecture Minimalist Week4 Seq2Seq Architecture Minimalist

Resources

Moodle Resources (HS25)

Lecture Slides

Student Handouts