N-gram Language Models
Statistical Foundations of NLP
42 SLIDES Part 1: Foundations
Same Model, Different Quality: Your phone keyboard uses language models to predict your next word. But how does it know "I want to" is more likely followed by "go" than "xylophone"? The answer lies in probability chains.
Prerequisites
- Basic probability (conditional probability, Bayes' theorem)
- High school mathematics
- No programming required for lecture content
Overview
Build probabilistic language models using n-grams. Learn smoothing techniques and evaluation metrics.
Learning Objectives
- Explain the connection between dice rolling and word prediction
- Calculate n-gram probabilities from text corpora
- Apply the Markov assumption to language modeling
- Compute perplexity to evaluate model quality
- Understand the smoothing problem and basic solutions
Key Topics
Markov chains
Smoothing techniques
Perplexity
Language generation
Key Concepts
Language modelA probability distribution over word sequences
N-gramA contiguous sequence of n items from text
Markov assumptionFuture depends only on recent past (limited context)
PerplexityHow "surprised" a model is by test data (lower is better)
SmoothingHandling unseen n-grams (Laplace, Kneser-Ney)
Chain rule of probabilityP(w1,w2,...,wn) as product of conditionals
Key Visualizations
Ngram Context Windows
Smoothing Comparison
Addk Smoothing
Conditional Probability Tree