Predicting
the Next
Words

From N-grams
to LLMs

Predicting the Next Words

Language Modeling from Shannon to GPT

A Springer textbook exploring how machines learn to predict the next word — from classical n-grams through neural networks to modern large language models.

Explore the Book →

What You'll Learn

Part I: Foundations

Chapters 1–3

Master the mathematical tools for evaluating language models — probability, information theory, perplexity — and understand how classical n-gram models set the stage.

Part II: Neural Language Models

Chapters 4–7

From word embeddings to attention mechanisms, trace how neural networks transformed language modeling with distributed representations and sequence processing.

Part III: The Transformer Revolution

Chapters 8–11

Dive deep into the Transformer architecture, pre-training paradigms like BERT and GPT, tokenization strategies, and the scaling laws that drive modern AI.

Part IV: Frontiers

Chapters 12–15

Explore alignment (RLHF, DPO), in-context learning, retrieval-augmented generation, agents, and the ethical challenges of language AI.

The Intellectual Arc

1948 Shannon

1980s N-grams

2003 Neural LMs

2013 Embeddings

2015 Attention

2017 Transformers

2018–23 LLMs

2023+ Alignment

Responsible AI

15 Chapters · 4 Parts · ~340 Pages · ~87 Figures · 50+ Papers

Start Exploring

Browse all 15 chapters with section outlines and metadata.

Reading Paths

Suggested routes through the book for different backgrounds.

Dependency Graph

Visualize how chapters build on each other.

Dashboard

Track progress, search content, and explore statistics.

Chapter Drafts (15 of 15 — COMPLETE)

Full chapter content with math rendering and code examples. View all drafts →

Ch 1: Introduction

10,740 words

Read → | PDF ↓

Ch 2: Mathematical Foundations

10,759 words

Read → | PDF ↓

Ch 3: Classical Language Models

8,514 words

Read → | PDF ↓

Ch 4: Word Representations

9,613 words

Read → | PDF ↓

Ch 5: Sequence Models

11,081 words

Read → | PDF ↓

Ch 6: The Attention Revolution

10,902 words

Read → | PDF ↓

Ch 7: Sequence-to-Sequence and Decoding

10,519 words

Read → | PDF ↓

Ch 8: The Transformer Architecture

12,877 words

Read → | PDF ↓

Ch 9: Pre-training Paradigms

13,529 words

Read → | PDF pending

Ch 10: Tokenization and Data at Scale

9,990 words

Read → | PDF ↓

Ch 11: Scaling Laws and Emergent Abilities

12,334 words

Read → | PDF pending

Ch 12: Alignment: RLHF, DPO, Safety

11,220 words

Read → | PDF ↓

Ch 13: In-Context Learning, Prompting, and Reasoning

12,595 words

Read → | PDF ↓

Ch 14: Retrieval, Agents, and Multimodal Models

11,107 words

Read → | PDF ↓

Ch 15: Ethics, Society, and the Future

8,034 words

Read → | PDF ↓

Project Status: This site documents the planning framework for the textbook. The master questionnaire, chapter-level writing prompts, dependency graph, and timeline are all actively maintained. Content generation follows the phased writing order defined in the framework.

Predicting the Next Words

What You'll Learn

Chapters 1–3

Chapters 4–7

Chapters 8–11

Chapters 12–15

The Intellectual Arc

Start Exploring

Table of Contents

Reading Paths

Dependency Graph

Dashboard

Chapter Drafts (15 of 15 — COMPLETE)