N-gram Language Models

Statistical Foundations of NLP

42 SLIDES Part 1: Foundations

?
Same Model, Different Quality: Your phone keyboard uses language models to predict your next word. But how does it know "I want to" is more likely followed by "go" than "xylophone"? The answer lies in probability chains.

Prerequisites

  • Basic probability (conditional probability, Bayes' theorem)
  • High school mathematics
  • No programming required for lecture content

Overview

Build probabilistic language models using n-grams. Learn smoothing techniques and evaluation metrics.

Learning Objectives

  • Explain the connection between dice rolling and word prediction
  • Calculate n-gram probabilities from text corpora
  • Apply the Markov assumption to language modeling
  • Compute perplexity to evaluate model quality
  • Understand the smoothing problem and basic solutions

Key Topics

Markov chains
Smoothing techniques
Perplexity
Language generation

Key Concepts

Language modelA probability distribution over word sequences
N-gramA contiguous sequence of n items from text
Markov assumptionFuture depends only on recent past (limited context)
PerplexityHow "surprised" a model is by test data (lower is better)
SmoothingHandling unseen n-grams (Laplace, Kneser-Ney)
Chain rule of probabilityP(w1,w2,...,wn) as product of conditionals

Key Visualizations

Ngram Context Windows Ngram Context Windows
Smoothing Comparison Smoothing Comparison
Addk Smoothing Addk Smoothing
Conditional Probability Tree Conditional Probability Tree

Resources

Moodle Resources (HS25)