Skip to content

Natural-Language-Processing

NLP Course 2025: From N-grams to Transformers - Complete 12-week curriculum with discovery-based pedagogy

View on GitHub


Information

Property Value
Language Jupyter Notebook
Stars 0
Forks 0
Watchers 0
Open Issues 37
License MIT License
Created 2025-11-22
Last Updated 2026-01-08
Last Push 2025-12-21
Contributors 2
Default Branch main
Visibility public

Notebooks

This repository contains 47 notebook(s):

Notebook Language Type

| llm_summarization_lab | PYTHON | jupyter |

| week01_ngrams_lab | PYTHON | jupyter |

| week02_word_embeddings_lab | PYTHON | jupyter |

| week03_rnn_lab | PYTHON | jupyter |

| week03_rnn_lab_enhanced | PYTHON | jupyter |

| week04_part1_basic_seq2seq | PYTHON | jupyter |

| week04_part2_attention | PYTHON | jupyter |

| week04_part3_advanced | PYTHON | jupyter |

| week04_seq2seq_lab | PYTHON | jupyter |

| week04_seq2seq_lab_enhanced | PYTHON | jupyter |

| week05_transformer_lab | PYTHON | jupyter |

| week06_bert_finetuning | PYTHON | jupyter |

| week06_pretrained_feature_extraction | PYTHON | jupyter |

| week07_advanced_transformers_lab | PYTHON | jupyter |

| week08_tokenization_lab | PYTHON | jupyter |

| week09_decoding_lab | PYTHON | jupyter |

| week09_decoding_simplified | PYTHON | jupyter |

| week10_finetuning_lab | PYTHON | jupyter |

| week11_efficiency_lab | PYTHON | jupyter |

| week12_ethics_lab | PYTHON | jupyter |

| demo_agent_multistep | PYTHON | jupyter |

| demo_rag_simple | PYTHON | jupyter |

| demo_reasoning_compare | PYTHON | jupyter |

| decoding | PYTHON | jupyter |

| efficiency | PYTHON | jupyter |

| embeddings | PYTHON | jupyter |

| ethics | PYTHON | jupyter |

| finetuning | PYTHON | jupyter |

| ngrams | PYTHON | jupyter |

| pretrained | PYTHON | jupyter |

| rnn-lstm | PYTHON | jupyter |

| scaling | PYTHON | jupyter |

| seq2seq | PYTHON | jupyter |

| tokenization | PYTHON | jupyter |

| transformers | PYTHON | jupyter |

| discovery_notebook | PYTHON | jupyter |

| word_embeddings_3d_msc | PYTHON | jupyter |

| ngrams_Alice_in_Wonderland | PYTHON | jupyter |

| shakespeare_sonnets_simple_bsc | PYTHON | jupyter |

| 1_simple_ngrams | PYTHON | jupyter |

| 2_word_embeddings | PYTHON | jupyter |

| 3_simple_neural_net | PYTHON | jupyter |

| 4_compare_NLP_methods | PYTHON | jupyter |

| 5_Tokens Journey Through a Transformer | PYTHON | jupyter |

| 6_Transformers in 3D A Visual Journey | PYTHON | jupyter |

| 7_Transformers_in_3d_simplified | PYTHON | jupyter |

| 8_How_Transformers_Learn_Training_in_3D | PYTHON | jupyter |

Datasets

This repository includes 33 dataset(s):

Dataset Format Size

| data | | 0.0 KB |

| moodle_topic_mapping.json | .json | 8.07 KB |

| manifest.json | .json | 15.98 KB |

| link_report_20251208_0935.csv | .csv | 34.57 KB |

| link_report_20251208_0935.json | .json | 62.65 KB |

| search.json | .json | 6.16 KB |

| action_items.json | .json | 52.46 KB |

| chart_catalog.json | .json | 136.8 KB |

| comprehensive_fix_log.json | .json | 1.68 KB |

| course_overview.json | .json | 17.15 KB |

| embeddings.json | .json | 191.95 KB |

| fix_log.json | .json | 13.38 KB |

| lstm_primer.json | .json | 88.36 KB |

| master_catalog.json | .json | 2760.1 KB |

| nn_primer.json | .json | 206.31 KB |

| sentiment.json | .json | 98.77 KB |

| summarization.json | .json | 129.25 KB |

| week00.json | .json | 94.15 KB |

| week01.json | .json | 139.94 KB |

| week02.json | .json | 122.56 KB |

| week03.json | .json | 119.77 KB |

| week04.json | .json | 155.54 KB |

| week05.json | .json | 133.55 KB |

| week06.json | .json | 209.05 KB |

| week07.json | .json | 133.77 KB |

| week08.json | .json | 32.24 KB |

| week09.json | .json | 174.14 KB |

| week10.json | .json | 186.1 KB |

| week11.json | .json | 193.21 KB |

| week12.json | .json | 126.63 KB |

| moodle_data.json | .json | 35.12 KB |

| layout_report.json | .json | 8.32 KB |

| verification_results.json | .json | 20.89 KB |

Reproducibility

This repository includes reproducibility tools:

  • Python requirements.txt

  • Conda environment.yml

  • Makefile for automation

Latest Release

  • Version: latest-lectures
  • Name: NLP Course - All Lectures
  • Published: 2025-12-12

Status

  • Issues: Enabled
  • Wiki: Disabled
  • Pages: Enabled

README

NLP Course 2025: From N-grams to Transformers

QuantLet Logo

QuantLet-Compatible Course Materials

Course Status Framework Labs Charts License

A comprehensive Natural Language Processing course covering statistical foundations through modern transformer architectures. Build ChatGPT from scratch!

Quick Start (3 Steps)

# 1. Clone the repository
git clone https://github.com/josterri/2025_NLP_Lectures.git
cd 2025_NLP_Lectures

# 2. Install dependencies
pip install -r requirements.txt

# 3. Start learning!
jupyter lab NLP_slides/week02_neural_lm/lab/week02_word_embeddings_lab.ipynb

What You'll Learn

This course takes you from foundational statistical methods to state-of-the-art neural architectures:

  • Weeks 1-2: Statistical language models and word embeddings (Word2Vec, GloVe)
  • Weeks 3-4: Sequential models (RNN/LSTM) and sequence-to-sequence with attention
  • Weeks 5-7: Transformers, BERT, GPT, and advanced architectures
  • Weeks 8-10: Tokenization, decoding strategies, and fine-tuning
  • Weeks 11-12: Efficiency optimization and ethical AI deployment

By the end, you'll build a working transformer from scratch and understand the architecture behind ChatGPT and Claude.

Course Structure

Core Materials (12 Weeks)

Each week includes: - Presentation: LaTeX/Beamer slides with optimal readability - Lab Notebook: Interactive Jupyter notebook with hands-on exercises - Handouts: Pre-class discovery exercises and post-class technical practice

Supplementary Modules

  • Neural Network Primer: Zero pre-knowledge intro to neural networks
  • LSTM Primer: Comprehensive deep dive into LSTM architecture (32 slides)
  • Embeddings Module: Standalone word embedding module with 3D visualizations

Total Content

  • 60+ presentations (including versions and supplements)
  • 12 interactive lab notebooks
  • 40+ handout documents
  • 100+ Python-generated figures
  • 8 progressive visualization notebooks

Prerequisites

  • Required:
  • Python 3.8 or higher
  • Basic linear algebra (vectors, matrices)
  • Basic probability theory
  • Comfortable with Python programming

  • Helpful but not required:

  • PyTorch experience
  • Understanding of backpropagation
  • Machine learning fundamentals

New to neural networks? Start with our Neural Network Primer module before Week 2.

Installation

pip install -r requirements.txt

Option 2: conda

conda env create -f environment.yml
conda activate nlp2025

GPU Support

For GPU acceleration (recommended for Weeks 5+):

# CUDA 11.8
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# CUDA 12.1
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

See INSTALLATION.md for detailed setup instructions and troubleshooting.

Course Navigation

Week-by-Week Guide

Full navigation with topics, prerequisites, and learning objectives: COURSE_INDEX.md

Week Highlights

Week Topic Key Concepts Lab
1 Foundations N-grams, perplexity, statistical LM -
2 Word Embeddings Word2Vec, GloVe, neural LM Implement embeddings
3 RNN/LSTM Sequential models, BPTT Build LSTM from scratch
4 Seq2Seq Attention mechanism, translation Machine translation
5 Transformers Self-attention, multi-head Build transformer
6 Pre-trained BERT, GPT, transfer learning Fine-tune BERT
7 Advanced T5, GPT-3, scaling laws Experiment with GPT
8 Tokenization BPE, WordPiece, SentencePiece Implement tokenizer
9 Decoding Beam, sampling, nucleus, contrastive Compare 6 methods
10 Fine-tuning LoRA, prompt engineering Adapt models
11 Efficiency Quantization, distillation Optimize models
12 Ethics Bias, fairness, safety Measure bias

Quantlet Charts

All Python-generated visualizations follow the Quantlet standard format with: - Numbered folders (01_chart_name/, 02_chart_name/, etc.) - Self-contained Python scripts - Standard metainfo.txt with description, keywords, and usage

Final Lecture Charts

See FinalLecture/ for 8 Quantlet-formatted visualizations covering: - Vector database architecture - HNSW nearest neighbor search - RAG conditional probabilities - Hybrid search flow

Project Structure

├── FinalLecture/               # Quantlet-formatted charts (Final Lecture)
├── logo/                       # Quantlet branding
├── NLP_slides/
│   ├── week01_foundations/      # Week 1: Statistical LM
│   ├── week02_neural_lm/        # Week 2: Word embeddings
│   ├── week03_rnn/              # Week 3: RNN/LSTM/GRU
│   ├── ...                      # Weeks 4-12
│   ├── nn_primer/               # Neural network primer
│   ├── lstm_primer/             # LSTM deep dive
│   └── common/                  # Shared templates and utils
├── embeddings/                  # Standalone embeddings module
├── exercises/                   # Additional practice
├── figures/                     # Shared visualizations
├── requirements.txt             # Python dependencies
├── environment.yml              # Conda environment
└── COURSE_INDEX.md              # Full course navigation

Key Learning Milestones

  • After Week 2: Understand and implement word embeddings
  • After Week 3: Build RNN and LSTM from scratch
  • After Week 5: Comprehend transformer architecture completely
  • After Week 6: Fine-tune pre-trained models (BERT, GPT)
  • After Week 9: Control text generation quality and diversity
  • After Week 12: Deploy models responsibly with ethical considerations

Usage Examples

Run a Lab Notebook

# Start Jupyter Lab
jupyter lab

# Navigate to a week's lab folder
cd NLP_slides/week05_transformers/lab
jupyter notebook week05_transformer_lab.ipynb

Compile a Presentation

cd NLP_slides/week02_neural_lm/presentations
pdflatex week02_neural_lm.tex

Generate Figures

cd NLP_slides/week05_transformers/python
python generate_week05_optimal_charts.py

Testing the Course

Test all lab notebooks for execution:

python test_notebooks.py

This validates that all 12 lab notebooks execute correctly in your environment.

Course Delivery Options

Standard 12-Week Semester

  • One week per topic
  • Weekly labs and assignments
  • Suitable for undergraduate/graduate courses

Intensive 8-Week Course

  • Combine Weeks 1-2, skip some advanced topics
  • Accelerated pace for bootcamps
  • Focus on core transformer concepts

Self-Paced Learning

  • Progress at your own speed
  • Complete prerequisite modules first
  • Focus on labs and hands-on practice

Documentation

Support and Resources

  • Issues: Report problems at GitHub Issues
  • Prerequisites: Check the Neural Network Primer if you're new to deep learning
  • GPU Requirements: Most labs work on CPU; Weeks 5+ benefit from GPU

Contributing

Contributions are welcome! Areas for contribution: - Additional exercises and examples - Translations to other languages - MSc-level challenge problems - Bug fixes and improvements

License

This course is released under the MIT License. See LICENSE for details.

Acknowledgments

Course materials developed with pedagogical focus on: - Discovery-based learning - Concrete-to-abstract progression - Hands-on implementation - Real-world applications

Built with LaTeX/Beamer, Python, PyTorch, and Jupyter.

Citation

If you use these materials in your course or research, please cite:

@misc{nlp2025course,
  title={NLP Course 2025: From N-grams to Transformers},
  author={Joerg Osterrieder},
  year={2025},
  url={https://github.com/josterri/2025_NLP_Lectures}
}

Ready to start? Check INSTALLATION.md for setup, then dive into Week 2's word embeddings lab!

Questions? See COURSE_INDEX.md for complete navigation and prerequisites.

Description

NLP Course 2025: From N-grams to Transformers - Complete 12-week curriculum with discovery-based pedagogy