Natural-Language-Processing
NLP Course 2025: From N-grams to Transformers - Complete 12-week curriculum with discovery-based pedagogy
Information
| Property | Value |
|---|---|
| Language | Jupyter Notebook |
| Stars | 0 |
| Forks | 0 |
| Watchers | 0 |
| Open Issues | 37 |
| License | MIT License |
| Created | 2025-11-22 |
| Last Updated | 2026-01-08 |
| Last Push | 2025-12-21 |
| Contributors | 2 |
| Default Branch | main |
| Visibility | public |
Notebooks
This repository contains 47 notebook(s):
| Notebook | Language | Type |
|---|---|---|
| llm_summarization_lab | PYTHON | jupyter |
| week01_ngrams_lab | PYTHON | jupyter |
| week02_word_embeddings_lab | PYTHON | jupyter |
| week03_rnn_lab | PYTHON | jupyter |
| week03_rnn_lab_enhanced | PYTHON | jupyter |
| week04_part1_basic_seq2seq | PYTHON | jupyter |
| week04_part2_attention | PYTHON | jupyter |
| week04_part3_advanced | PYTHON | jupyter |
| week04_seq2seq_lab | PYTHON | jupyter |
| week04_seq2seq_lab_enhanced | PYTHON | jupyter |
| week05_transformer_lab | PYTHON | jupyter |
| week06_bert_finetuning | PYTHON | jupyter |
| week06_pretrained_feature_extraction | PYTHON | jupyter |
| week07_advanced_transformers_lab | PYTHON | jupyter |
| week08_tokenization_lab | PYTHON | jupyter |
| week09_decoding_lab | PYTHON | jupyter |
| week09_decoding_simplified | PYTHON | jupyter |
| week10_finetuning_lab | PYTHON | jupyter |
| week11_efficiency_lab | PYTHON | jupyter |
| week12_ethics_lab | PYTHON | jupyter |
| demo_agent_multistep | PYTHON | jupyter |
| demo_rag_simple | PYTHON | jupyter |
| demo_reasoning_compare | PYTHON | jupyter |
| decoding | PYTHON | jupyter |
| efficiency | PYTHON | jupyter |
| embeddings | PYTHON | jupyter |
| ethics | PYTHON | jupyter |
| finetuning | PYTHON | jupyter |
| ngrams | PYTHON | jupyter |
| pretrained | PYTHON | jupyter |
| rnn-lstm | PYTHON | jupyter |
| scaling | PYTHON | jupyter |
| seq2seq | PYTHON | jupyter |
| tokenization | PYTHON | jupyter |
| transformers | PYTHON | jupyter |
| discovery_notebook | PYTHON | jupyter |
| word_embeddings_3d_msc | PYTHON | jupyter |
| ngrams_Alice_in_Wonderland | PYTHON | jupyter |
| shakespeare_sonnets_simple_bsc | PYTHON | jupyter |
| 1_simple_ngrams | PYTHON | jupyter |
| 2_word_embeddings | PYTHON | jupyter |
| 3_simple_neural_net | PYTHON | jupyter |
| 4_compare_NLP_methods | PYTHON | jupyter |
| 5_Tokens Journey Through a Transformer | PYTHON | jupyter |
| 6_Transformers in 3D A Visual Journey | PYTHON | jupyter |
| 7_Transformers_in_3d_simplified | PYTHON | jupyter |
| 8_How_Transformers_Learn_Training_in_3D | PYTHON | jupyter |
Datasets
This repository includes 33 dataset(s):
| Dataset | Format | Size |
|---|---|---|
| data | | 0.0 KB |
| moodle_topic_mapping.json | .json | 8.07 KB |
| manifest.json | .json | 15.98 KB |
| link_report_20251208_0935.csv | .csv | 34.57 KB |
| link_report_20251208_0935.json | .json | 62.65 KB |
| search.json | .json | 6.16 KB |
| action_items.json | .json | 52.46 KB |
| chart_catalog.json | .json | 136.8 KB |
| comprehensive_fix_log.json | .json | 1.68 KB |
| course_overview.json | .json | 17.15 KB |
| embeddings.json | .json | 191.95 KB |
| fix_log.json | .json | 13.38 KB |
| lstm_primer.json | .json | 88.36 KB |
| master_catalog.json | .json | 2760.1 KB |
| nn_primer.json | .json | 206.31 KB |
| sentiment.json | .json | 98.77 KB |
| summarization.json | .json | 129.25 KB |
| week00.json | .json | 94.15 KB |
| week01.json | .json | 139.94 KB |
| week02.json | .json | 122.56 KB |
| week03.json | .json | 119.77 KB |
| week04.json | .json | 155.54 KB |
| week05.json | .json | 133.55 KB |
| week06.json | .json | 209.05 KB |
| week07.json | .json | 133.77 KB |
| week08.json | .json | 32.24 KB |
| week09.json | .json | 174.14 KB |
| week10.json | .json | 186.1 KB |
| week11.json | .json | 193.21 KB |
| week12.json | .json | 126.63 KB |
| moodle_data.json | .json | 35.12 KB |
| layout_report.json | .json | 8.32 KB |
| verification_results.json | .json | 20.89 KB |
Reproducibility
This repository includes reproducibility tools:
-
Python requirements.txt
-
Conda environment.yml
-
Makefile for automation
Latest Release
- Version: latest-lectures
- Name: NLP Course - All Lectures
- Published: 2025-12-12
Status
- Issues: Enabled
- Wiki: Disabled
- Pages: Enabled
README
NLP Course 2025: From N-grams to Transformers
QuantLet-Compatible Course Materials
A comprehensive Natural Language Processing course covering statistical foundations through modern transformer architectures. Build ChatGPT from scratch!
Quick Start (3 Steps)
# 1. Clone the repository
git clone https://github.com/josterri/2025_NLP_Lectures.git
cd 2025_NLP_Lectures
# 2. Install dependencies
pip install -r requirements.txt
# 3. Start learning!
jupyter lab NLP_slides/week02_neural_lm/lab/week02_word_embeddings_lab.ipynb
What You'll Learn
This course takes you from foundational statistical methods to state-of-the-art neural architectures:
- Weeks 1-2: Statistical language models and word embeddings (Word2Vec, GloVe)
- Weeks 3-4: Sequential models (RNN/LSTM) and sequence-to-sequence with attention
- Weeks 5-7: Transformers, BERT, GPT, and advanced architectures
- Weeks 8-10: Tokenization, decoding strategies, and fine-tuning
- Weeks 11-12: Efficiency optimization and ethical AI deployment
By the end, you'll build a working transformer from scratch and understand the architecture behind ChatGPT and Claude.
Course Structure
Core Materials (12 Weeks)
Each week includes: - Presentation: LaTeX/Beamer slides with optimal readability - Lab Notebook: Interactive Jupyter notebook with hands-on exercises - Handouts: Pre-class discovery exercises and post-class technical practice
Supplementary Modules
- Neural Network Primer: Zero pre-knowledge intro to neural networks
- LSTM Primer: Comprehensive deep dive into LSTM architecture (32 slides)
- Embeddings Module: Standalone word embedding module with 3D visualizations
Total Content
- 60+ presentations (including versions and supplements)
- 12 interactive lab notebooks
- 40+ handout documents
- 100+ Python-generated figures
- 8 progressive visualization notebooks
Prerequisites
- Required:
- Python 3.8 or higher
- Basic linear algebra (vectors, matrices)
- Basic probability theory
-
Comfortable with Python programming
-
Helpful but not required:
- PyTorch experience
- Understanding of backpropagation
- Machine learning fundamentals
New to neural networks? Start with our Neural Network Primer module before Week 2.
Installation
Option 1: pip (Recommended)
Option 2: conda
GPU Support
For GPU acceleration (recommended for Weeks 5+):
# CUDA 11.8
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# CUDA 12.1
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
See INSTALLATION.md for detailed setup instructions and troubleshooting.
Course Navigation
Week-by-Week Guide
Full navigation with topics, prerequisites, and learning objectives: COURSE_INDEX.md
Week Highlights
| Week | Topic | Key Concepts | Lab |
|---|---|---|---|
| 1 | Foundations | N-grams, perplexity, statistical LM | - |
| 2 | Word Embeddings | Word2Vec, GloVe, neural LM | Implement embeddings |
| 3 | RNN/LSTM | Sequential models, BPTT | Build LSTM from scratch |
| 4 | Seq2Seq | Attention mechanism, translation | Machine translation |
| 5 | Transformers | Self-attention, multi-head | Build transformer |
| 6 | Pre-trained | BERT, GPT, transfer learning | Fine-tune BERT |
| 7 | Advanced | T5, GPT-3, scaling laws | Experiment with GPT |
| 8 | Tokenization | BPE, WordPiece, SentencePiece | Implement tokenizer |
| 9 | Decoding | Beam, sampling, nucleus, contrastive | Compare 6 methods |
| 10 | Fine-tuning | LoRA, prompt engineering | Adapt models |
| 11 | Efficiency | Quantization, distillation | Optimize models |
| 12 | Ethics | Bias, fairness, safety | Measure bias |
Quantlet Charts
All Python-generated visualizations follow the Quantlet standard format with:
- Numbered folders (01_chart_name/, 02_chart_name/, etc.)
- Self-contained Python scripts
- Standard metainfo.txt with description, keywords, and usage
Final Lecture Charts
See FinalLecture/ for 8 Quantlet-formatted visualizations covering: - Vector database architecture - HNSW nearest neighbor search - RAG conditional probabilities - Hybrid search flow
Project Structure
├── FinalLecture/ # Quantlet-formatted charts (Final Lecture)
├── logo/ # Quantlet branding
├── NLP_slides/
│ ├── week01_foundations/ # Week 1: Statistical LM
│ ├── week02_neural_lm/ # Week 2: Word embeddings
│ ├── week03_rnn/ # Week 3: RNN/LSTM/GRU
│ ├── ... # Weeks 4-12
│ ├── nn_primer/ # Neural network primer
│ ├── lstm_primer/ # LSTM deep dive
│ └── common/ # Shared templates and utils
├── embeddings/ # Standalone embeddings module
├── exercises/ # Additional practice
├── figures/ # Shared visualizations
├── requirements.txt # Python dependencies
├── environment.yml # Conda environment
└── COURSE_INDEX.md # Full course navigation
Key Learning Milestones
- ✅ After Week 2: Understand and implement word embeddings
- ✅ After Week 3: Build RNN and LSTM from scratch
- ✅ After Week 5: Comprehend transformer architecture completely
- ✅ After Week 6: Fine-tune pre-trained models (BERT, GPT)
- ✅ After Week 9: Control text generation quality and diversity
- ✅ After Week 12: Deploy models responsibly with ethical considerations
Usage Examples
Run a Lab Notebook
# Start Jupyter Lab
jupyter lab
# Navigate to a week's lab folder
cd NLP_slides/week05_transformers/lab
jupyter notebook week05_transformer_lab.ipynb
Compile a Presentation
Generate Figures
Testing the Course
Test all lab notebooks for execution:
This validates that all 12 lab notebooks execute correctly in your environment.
Course Delivery Options
Standard 12-Week Semester
- One week per topic
- Weekly labs and assignments
- Suitable for undergraduate/graduate courses
Intensive 8-Week Course
- Combine Weeks 1-2, skip some advanced topics
- Accelerated pace for bootcamps
- Focus on core transformer concepts
Self-Paced Learning
- Progress at your own speed
- Complete prerequisite modules first
- Focus on labs and hands-on practice
Documentation
- COURSE_INDEX.md - Complete week-by-week navigation
- INSTALLATION.md - Detailed setup instructions
- CLAUDE.md - Development guide and conventions
- status.md - Project status and completion tracking
- changelog.md - Change history
Support and Resources
- Issues: Report problems at GitHub Issues
- Prerequisites: Check the Neural Network Primer if you're new to deep learning
- GPU Requirements: Most labs work on CPU; Weeks 5+ benefit from GPU
Contributing
Contributions are welcome! Areas for contribution: - Additional exercises and examples - Translations to other languages - MSc-level challenge problems - Bug fixes and improvements
License
This course is released under the MIT License. See LICENSE for details.
Acknowledgments
Course materials developed with pedagogical focus on: - Discovery-based learning - Concrete-to-abstract progression - Hands-on implementation - Real-world applications
Built with LaTeX/Beamer, Python, PyTorch, and Jupyter.
Citation
If you use these materials in your course or research, please cite:
@misc{nlp2025course,
title={NLP Course 2025: From N-grams to Transformers},
author={Joerg Osterrieder},
year={2025},
url={https://github.com/josterri/2025_NLP_Lectures}
}
Ready to start? Check INSTALLATION.md for setup, then dive into Week 2's word embeddings lab!
Questions? See COURSE_INDEX.md for complete navigation and prerequisites.
Description
NLP Course 2025: From N-grams to Transformers - Complete 12-week curriculum with discovery-based pedagogy