Predicting
the Next
Words
From N-grams
to LLMs

Predicting the Next Words

Language Modeling from Shannon to GPT

A Springer textbook exploring how machines learn to predict the next word — from classical n-grams through neural networks to modern large language models.

Explore the Book →

What You'll Learn

Part I: Foundations

Chapters 1–3

Master the mathematical tools for evaluating language models — probability, information theory, perplexity — and understand how classical n-gram models set the stage.

Part II: Neural Language Models

Chapters 4–7

From word embeddings to attention mechanisms, trace how neural networks transformed language modeling with distributed representations and sequence processing.

Part III: The Transformer Revolution

Chapters 8–11

Dive deep into the Transformer architecture, pre-training paradigms like BERT and GPT, tokenization strategies, and the scaling laws that drive modern AI.

Part IV: Frontiers

Chapters 12–15

Explore alignment (RLHF, DPO), in-context learning, retrieval-augmented generation, agents, and the ethical challenges of language AI.

The Intellectual Arc

1948 Shannon
1980s N-grams
2003 Neural LMs
2013 Embeddings
2015 Attention
2017 Transformers
2018–23 LLMs
2023+ Alignment
  Responsible AI
15 Chapters · 4 Parts · ~340 Pages · ~87 Figures · 50+ Papers

Start Exploring

Chapter Drafts (15 of 15 — COMPLETE)

Full chapter content with math rendering and code examples. View all drafts →

Ch 1: Introduction
10,740 words
Read → | PDF ↓
Ch 2: Mathematical Foundations
10,759 words
Read → | PDF ↓
Ch 3: Classical Language Models
8,514 words
Read → | PDF ↓
Ch 4: Word Representations
9,613 words
Read → | PDF ↓
Ch 5: Sequence Models
11,081 words
Read → | PDF ↓
Ch 6: The Attention Revolution
10,902 words
Read → | PDF ↓
Ch 7: Sequence-to-Sequence and Decoding
10,519 words
Read → | PDF ↓
Ch 8: The Transformer Architecture
12,877 words
Read → | PDF ↓
Ch 9: Pre-training Paradigms
13,529 words
Read → | PDF pending
Ch 10: Tokenization and Data at Scale
9,990 words
Read → | PDF ↓
Ch 11: Scaling Laws and Emergent Abilities
12,334 words
Read → | PDF pending
Ch 12: Alignment: RLHF, DPO, Safety
11,220 words
Read → | PDF ↓
Ch 13: In-Context Learning, Prompting, and Reasoning
12,595 words
Read → | PDF ↓
Ch 14: Retrieval, Agents, and Multimodal Models
11,107 words
Read → | PDF ↓
Ch 15: Ethics, Society, and the Future
8,034 words
Read → | PDF ↓
Project Status: This site documents the planning framework for the textbook. The master questionnaire, chapter-level writing prompts, dependency graph, and timeline are all actively maintained. Content generation follows the phased writing order defined in the framework.