RNN & LSTM Networks

Sequential Processing

21 SLIDES Part 1: Foundations

?
The Water Tank Analogy: Imagine designing a memory system for text. You need to remember important things (like the subject of a sentence), forget irrelevant details, and output the right information at the right time. LSTM solves this with three gates - like valves controlling a water tank.

Prerequisites

  • Week 2: Word embeddings and vector representations
  • Basic neural network concepts (layers, activation functions)
  • Understanding of backpropagation helpful

Overview

Process sequences with recurrent neural networks. Understand vanishing gradients and LSTM gates.

Learning Objectives

  • Explain why sequential data needs special architectures
  • Identify the vanishing gradient problem in vanilla RNNs
  • Describe how LSTM gates control information flow
  • Trace information through forget, input, and output gates
  • Compare LSTM to GRU and understand their trade-offs

Key Topics

RNN architecture
Vanishing gradients
LSTM gates
Sequence modeling

Key Concepts

RNN (Recurrent Neural Network)Processes sequences with hidden state
Vanishing gradientsSignal decay over long sequences
LSTMLong Short-Term Memory with gated memory cells
Forget gateDecides what information to discard
Input gateDecides what new information to store
Output gateDecides what information to output

Key Visualizations

Rnn Unrolled Rnn Unrolled
Vanishing Gradient Vanishing Gradient
Lstm Architecture Lstm Architecture
Gate Activation Heatmap Gate Activation Heatmap

Resources

Moodle Resources (HS25)