Chapters

Explore all 13 chapters covering the complete evolution of language modeling, from classical n-grams to modern large language models.

Foundations

Complete 28 figures

Ch 1: Introduction: The Problem of Prediction

From Shannon to Modern Language Models

This foundational chapter establishes the mathematical framework for language modeling, connecting Claude Shannon's 1...

44 pages 45 min

Complete 27 figures

Ch 2: N-gram Language Models

Statistical Foundations of Sequence Prediction

Explores classical n-gram models, the sparsity problem, and sophisticated smoothing techniques that remain relevant i...

32 pages 1 hr

Complete 26 figures

Ch 3: Tokenization

From Characters to Subwords

Examines how modern systems break text into tokens, from simple word splitting to sophisticated subword algorithms.

34 pages 40 min

Complete 28 figures

Ch 4: Word Embeddings

Distributed Representations of Meaning

Introduces vector representations that capture semantic relationships between words.

42 pages 55 min

Neural Language Models

Complete 28 figures

Ch 5: RNNs and LSTMs

Sequential Processing for Language

Covers recurrent architectures that process sequences one token at a time.

38 pages 45 min

Complete 32 figures

Ch 6: Transformers

Attention Is All You Need

The architecture that revolutionized NLP - detailed mathematical treatment of the transformer.

40 pages 50 min

Complete 27 figures

Ch 7: Decoding Strategies

From Probabilities to Text

How to generate text from probability distributions - the art and science of decoding.

39 pages 45 min

Planned 27 figures

Ch 8: Training Language Models

Optimization at Scale

The practical challenges of training neural language models on large datasets.

50 pages 55 min

Coming Soon

Large Language Models

Planned 32 figures

Ch 9: Large Language Models

GPT, BERT, and Beyond

The modern era of billion-parameter models and their surprising capabilities.

65 pages 1.5 hr

Coming Soon

Planned 28 figures

Ch 10: Scaling Laws

When Bigger Is Better

Mathematical relationships between model size, data, compute, and performance.

45 pages 50 min

Coming Soon

Planned 27 figures

Ch 11: Post-Training

Alignment and Fine-Tuning

Making language models helpful, harmless, and honest through alignment techniques.

50 pages 55 min

Coming Soon

Efficiency and Applications

Planned 27 figures

Ch 12: Efficient Language Models

Doing More with Less

Techniques for making large models practical: compression, sparsity, and efficiency.

45 pages 50 min

Coming Soon

Planned 27 figures

Ch 13: Applications

Language Models in Practice

Real-world applications of language models across domains.

40 pages 45 min

Coming Soon

Topics Covered

Foundations

Information theory and entropy
N-gram models and smoothing
Tokenization algorithms
Word embeddings

Neural Architectures

RNNs and LSTMs
Transformer architecture
Attention mechanisms
Decoding strategies

Modern LLMs

Scaling laws
RLHF and alignment
Efficiency techniques
Real-world applications