Chapter Dependencies & Reading Order
Understanding the prerequisite relationships between chapters is essential for both the writing process and for readers who wish to take non-linear paths through the book. This page maps every dependency, highlights the critical path, and suggests several alternative reading orders.
Interdependency Graph
Solid arrows represent hard prerequisites: a reader must have completed the source chapter before the target is accessible. Dashed arrows indicate soft dependencies where familiarity helps but is not mandatory. Chapter nodes colored in red receive deep treatment (≥25 pages with derivations).
Critical Path
The longest chain of hard prerequisites determines the minimum sequential reading required to reach the most advanced material. That path is:
Ch 1 (Introduction) → Ch 2 (Math Foundations) → Ch 4 (Word Representations) → Ch 5 (Sequence Models) → Ch 6 (Attention) → Ch 8 (Transformer) → Ch 9 (Pre-training) → Ch 12 (Alignment)
This eight-chapter spine covers approximately 200 pages and takes a reader from probability basics through RLHF. Every other chapter branches off this spine and can be read in parallel once its prerequisites on the spine are met.
Full Dependency Table
The bidirectional table below lists both what each chapter requires and what later chapters it unlocks. Chapters marked (DEEP) receive extended treatment with derivations and implementation examples.
| Ch | Title | Hard Prerequisites | Required By |
|---|---|---|---|
| 1 | Introduction | — | All |
| 2 | Math Foundations | Ch 1 | Ch 3, Ch 4 |
| 3 | Classical LMs | Ch 1, Ch 2 | Ch 5 |
| 4 | Word Representations | Ch 1, Ch 2 | Ch 5 |
| 5 | Sequence Models | Ch 1, Ch 3, Ch 4 | Ch 6 |
| 6 | Attention (DEEP) | Ch 1, Ch 5 | Ch 7, Ch 8 |
| 7 | Seq2Seq & Decoding | Ch 1, Ch 6 | Ch 8 |
| 8 | Transformer (DEEP) | Ch 1, Ch 6, Ch 7 | Ch 9, Ch 10 |
| 9 | Pre-training (DEEP) | Ch 1, Ch 8 | Ch 11, Ch 12, Ch 13 |
| 10 | Tokenization | Ch 1, Ch 8 | — |
| 11 | Scaling Laws | Ch 1, Ch 9 | — |
| 12 | Alignment (DEEP) | Ch 1, Ch 9 | — |
| 13 | ICL & Prompting | Ch 1, Ch 9 | — |
| 14 | RAG & Agents | Ch 1 | — |
| 15 | Ethics & Future | Ch 1 | — |
Suggested Reading Paths
Not every reader needs to follow the linear chapter order. The dependency structure supports several focused paths depending on background and goals.
Sequential (Full Book)
Chapters 1 through 15 in order. Recommended for graduate students taking a full-semester course. Approximately 335 pages, 15–16 weeks at one chapter per week.
Fast Track (Core Theory)
Ch 1 → Ch 2 → Ch 6 → Ch 8 → Ch 9 → Ch 12
Six chapters (~155 pages). Skips the historical build-up and jumps directly to attention, transformers, pre-training, and alignment. Assumes mathematical maturity. Best for readers with prior exposure to deep learning who want to understand LLMs specifically.
Practitioner Path
Ch 1 → Ch 8 → Ch 9 → Ch 12 → Ch 13 → Ch 14
Six chapters (~150 pages). Focuses on the transformer architecture, how models are trained, aligned, prompted, and deployed. Minimal theory. Ideal for software engineers building on top of LLMs who need working intuition without full mathematical derivations.
Theory Path
Ch 1 → Ch 2 → Ch 3 → Ch 4 → Ch 5 → Ch 6 → Ch 8 → Ch 11
Eight chapters (~185 pages). Emphasizes the mathematical progression from count-based models through neural sequence models to scaling laws. Suitable for researchers interested in the theoretical underpinnings of language modeling.
Ethics & Policy Path
Ch 1 → Ch 9 → Ch 12 → Ch 15
Four chapters (~90 pages). Provides enough technical context to understand alignment challenges and their societal implications. Designed for policy analysts, journalists, and non-technical stakeholders.