N-gram Language Models

Statistical Foundations of NLP

42 SLIDES Part 1: Foundations

Same Model, Different Quality: Your phone keyboard uses language models to predict your next word. But how does it know "I want to" is more likely followed by "go" than "xylophone"? The answer lies in probability chains.

Prerequisites

Basic probability (conditional probability, Bayes' theorem)
High school mathematics
No programming required for lecture content

Overview

Build probabilistic language models using n-grams. Learn smoothing techniques and evaluation metrics.

Learning Objectives

Explain the connection between dice rolling and word prediction
Calculate n-gram probabilities from text corpora
Apply the Markov assumption to language modeling
Compute perplexity to evaluate model quality
Understand the smoothing problem and basic solutions

Key Topics

Markov chains

Smoothing techniques

Perplexity

Language generation

Key Concepts

Language modelA probability distribution over word sequences

N-gramA contiguous sequence of n items from text

Markov assumptionFuture depends only on recent past (limited context)

PerplexityHow "surprised" a model is by test data (lower is better)

SmoothingHandling unseen n-grams (Laplace, Kneser-Ney)

Chain rule of probabilityP(w1,w2,...,wn) as product of conditionals

Key Visualizations

Ngram Context Windows

Smoothing Comparison

Addk Smoothing

Conditional Probability Tree

Resources

View Slides (PDF) [source] Open in Colab Download Notebook Chart Gallery

Moodle Resources (HS25)

Lecture Slides

Foundations and Statistical Language Modelling

Exercises

Exercise: N-grams Intro

Student Handouts

Handout: Lesson 1 (German)

Next Word Embeddings