Chapters
Explore all 13 chapters covering the complete evolution of language modeling, from classical n-grams to modern large language models.
Foundations
Ch 1: Introduction: The Problem of Prediction
From Shannon to Modern Language Models
This foundational chapter establishes the mathematical framework for language modeling, connecting Claude Shannon's 1...
Ch 2: N-gram Language Models
Statistical Foundations of Sequence Prediction
Explores classical n-gram models, the sparsity problem, and sophisticated smoothing techniques that remain relevant i...
Ch 3: Tokenization
From Characters to Subwords
Examines how modern systems break text into tokens, from simple word splitting to sophisticated subword algorithms.
Ch 4: Word Embeddings
Distributed Representations of Meaning
Introduces vector representations that capture semantic relationships between words.
Neural Language Models
Ch 5: RNNs and LSTMs
Sequential Processing for Language
Covers recurrent architectures that process sequences one token at a time.
Ch 6: Transformers
Attention Is All You Need
The architecture that revolutionized NLP - detailed mathematical treatment of the transformer.
Ch 7: Decoding Strategies
From Probabilities to Text
How to generate text from probability distributions - the art and science of decoding.
Ch 8: Training Language Models
Optimization at Scale
The practical challenges of training neural language models on large datasets.
Large Language Models
Ch 9: Large Language Models
GPT, BERT, and Beyond
The modern era of billion-parameter models and their surprising capabilities.
Ch 10: Scaling Laws
When Bigger Is Better
Mathematical relationships between model size, data, compute, and performance.
Ch 11: Post-Training
Alignment and Fine-Tuning
Making language models helpful, harmless, and honest through alignment techniques.
Efficiency and Applications
Ch 12: Efficient Language Models
Doing More with Less
Techniques for making large models practical: compression, sparsity, and efficiency.
Ch 13: Applications
Language Models in Practice
Real-world applications of language models across domains.
Topics Covered
Foundations
- Information theory and entropy
- N-gram models and smoothing
- Tokenization algorithms
- Word embeddings
Neural Architectures
- RNNs and LSTMs
- Transformer architecture
- Attention mechanisms
- Decoding strategies
Modern LLMs
- Scaling laws
- RLHF and alignment
- Efficiency techniques
- Real-world applications
(c) Joerg Osterrieder 2025