Predicting the Next Word

A Mathematical Foundation of Language Models

A comprehensive PhD-level textbook covering the complete evolution of language modeling, from Shannon's 1948 information theory to modern large language models.

13
Chapters
364
Figures
380
Pages

About the Author

Prof. Dr. Joerg Osterrieder

Professor of Finance and Data Science

FHGR - University of Applied Sciences of the Grisons, Switzerland. Specializing in the intersection of quantitative finance, machine learning, and natural language processing.

Full Biography

Book Structure

Part I: Foundations
  • Introduction
  • N-gram Models
  • Tokenization
  • Embeddings
Part II: Neural LMs
  • RNNs & LSTMs
  • Transformers
  • Decoding
  • Training
Part III: LLMs
  • Large LMs
  • Scaling Laws
  • Post-Training
Part IV: Applications
  • Efficiency
  • Applications

Book Progress

Track the development of this comprehensive textbook.

Chapters Complete 7 / 13
Figures Generated 196 / 364
Full Progress Dashboard
54%
Overall Completion

(c) Joerg Osterrieder 2025