Predicting the Next Word

A Mathematical Foundation of Language Models

A comprehensive PhD-level textbook covering the complete evolution of language modeling, from Shannon's 1948 information theory to modern large language models.

Explore Chapters View Figures Download Full Book

Chapters

364

Figures

380

Pages

About the Author

Prof. Dr. Joerg Osterrieder

Professor of Finance and Data Science

FHGR - University of Applied Sciences of the Grisons, Switzerland. Specializing in the intersection of quantitative finance, machine learning, and natural language processing.

Full Biography

Book Structure

Part I: Foundations

Introduction
N-gram Models
Tokenization
Embeddings

Part II: Neural LMs

RNNs & LSTMs
Transformers
Decoding
Training

Part III: LLMs

Large LMs
Scaling Laws
Post-Training

Part IV: Applications

Efficiency
Applications

View All Chapters

Featured Visualizations

Over 360 publication-quality figures with Python source code

3D Entropy Surface

Information Theory Visualization

Interactive 3D surface showing entropy across probability distributions.

Chapter 1

Smoothing Comparison

N-gram Smoothing Techniques

Comparison of Laplace, Kneser-Ney, and interpolation methods.

Chapter 2

BPE Algorithm

Byte Pair Encoding

Step-by-step visualization of subword tokenization.

Chapter 3

Browse All Figures

Book Progress

Track the development of this comprehensive textbook.

Chapters Complete 7 / 13

Figures Generated 196 / 364

Full Progress Dashboard

54%

Overall Completion