Chapters

Explore all 13 chapters covering the complete evolution of language modeling, from classical n-grams to modern large language models.

Foundations

Complete 28 figures
Ch 1: Introduction: The Problem of Prediction

From Shannon to Modern Language Models

This foundational chapter establishes the mathematical framework for language modeling, connecting Claude Shannon's 1...

Complete 27 figures
Ch 2: N-gram Language Models

Statistical Foundations of Sequence Prediction

Explores classical n-gram models, the sparsity problem, and sophisticated smoothing techniques that remain relevant i...

Complete 26 figures
Ch 3: Tokenization

From Characters to Subwords

Examines how modern systems break text into tokens, from simple word splitting to sophisticated subword algorithms.

Complete 28 figures
Ch 4: Word Embeddings

Distributed Representations of Meaning

Introduces vector representations that capture semantic relationships between words.

Neural Language Models

Complete 28 figures
Ch 5: RNNs and LSTMs

Sequential Processing for Language

Covers recurrent architectures that process sequences one token at a time.

Complete 32 figures
Ch 6: Transformers

Attention Is All You Need

The architecture that revolutionized NLP - detailed mathematical treatment of the transformer.

Complete 27 figures
Ch 7: Decoding Strategies

From Probabilities to Text

How to generate text from probability distributions - the art and science of decoding.

Planned 27 figures
Ch 8: Training Language Models

Optimization at Scale

The practical challenges of training neural language models on large datasets.

Large Language Models

Planned 32 figures
Ch 9: Large Language Models

GPT, BERT, and Beyond

The modern era of billion-parameter models and their surprising capabilities.

Planned 28 figures
Ch 10: Scaling Laws

When Bigger Is Better

Mathematical relationships between model size, data, compute, and performance.

Planned 27 figures
Ch 11: Post-Training

Alignment and Fine-Tuning

Making language models helpful, harmless, and honest through alignment techniques.

Efficiency and Applications

Planned 27 figures
Ch 12: Efficient Language Models

Doing More with Less

Techniques for making large models practical: compression, sparsity, and efficiency.

Planned 27 figures
Ch 13: Applications

Language Models in Practice

Real-world applications of language models across domains.

Topics Covered

Foundations
  • Information theory and entropy
  • N-gram models and smoothing
  • Tokenization algorithms
  • Word embeddings
Neural Architectures
  • RNNs and LSTMs
  • Transformer architecture
  • Attention mechanisms
  • Decoding strategies
Modern LLMs
  • Scaling laws
  • RLHF and alignment
  • Efficiency techniques
  • Real-world applications

(c) Joerg Osterrieder 2025