RNN & LSTM Networks

Sequential Processing

21 SLIDES Part 1: Foundations

?

The Water Tank Analogy: Imagine designing a memory system for text. You need to remember important things (like the subject of a sentence), forget irrelevant details, and output the right information at the right time. LSTM solves this with three gates - like valves controlling a water tank.

Prerequisites

Week 2: Word embeddings and vector representations
Basic neural network concepts (layers, activation functions)
Understanding of backpropagation helpful

Overview

Process sequences with recurrent neural networks. Understand vanishing gradients and LSTM gates.

Learning Objectives

Explain why sequential data needs special architectures
Identify the vanishing gradient problem in vanilla RNNs
Describe how LSTM gates control information flow
Trace information through forget, input, and output gates
Compare LSTM to GRU and understand their trade-offs

Key Topics

RNN architecture

Vanishing gradients

LSTM gates

Sequence modeling

Key Concepts

RNN (Recurrent Neural Network)Processes sequences with hidden state

Vanishing gradientsSignal decay over long sequences

LSTMLong Short-Term Memory with gated memory cells

Forget gateDecides what information to discard

Input gateDecides what new information to store

Output gateDecides what information to output

Key Visualizations

Rnn Unrolled

Rnn Unrolled

Vanishing Gradient

Vanishing Gradient

Lstm Architecture

Lstm Architecture

Gate Activation Heatmap

Gate Activation Heatmap

Resources

View Slides (PDF) [source] Open in Colab Download Notebook Chart Gallery

Moodle Resources (HS25)

Lecture Slides

RNNs Introduction RNNs with Foundations LSTMs Introduction

Student Handouts

RNN Handout (Student)

Previous Word Embeddings Next Seq2Seq