Natural-Language-Processing

NLP Course 2025: From N-grams to Transformers - Complete 12-week curriculum with discovery-based pedagogy

Information

Property	Value
Language	Jupyter Notebook
Stars	0
Forks	0
Watchers	0
Open Issues	37
License	MIT License
Created	2025-11-22
Last Updated	2026-01-08
Last Push	2025-12-21
Contributors	2
Default Branch	main
Visibility	public

Notebooks

This repository contains 47 notebook(s):

Notebook	Language	Type

| llm_summarization_lab | PYTHON | jupyter |

| week01_ngrams_lab | PYTHON | jupyter |

| week02_word_embeddings_lab | PYTHON | jupyter |

| week03_rnn_lab | PYTHON | jupyter |

| week03_rnn_lab_enhanced | PYTHON | jupyter |

| week04_part1_basic_seq2seq | PYTHON | jupyter |

| week04_part2_attention | PYTHON | jupyter |

| week04_part3_advanced | PYTHON | jupyter |

| week04_seq2seq_lab | PYTHON | jupyter |

| week04_seq2seq_lab_enhanced | PYTHON | jupyter |

| week05_transformer_lab | PYTHON | jupyter |

| week06_bert_finetuning | PYTHON | jupyter |

| week06_pretrained_feature_extraction | PYTHON | jupyter |

| week07_advanced_transformers_lab | PYTHON | jupyter |

| week08_tokenization_lab | PYTHON | jupyter |

| week09_decoding_lab | PYTHON | jupyter |

| week09_decoding_simplified | PYTHON | jupyter |

| week10_finetuning_lab | PYTHON | jupyter |

| week11_efficiency_lab | PYTHON | jupyter |

| week12_ethics_lab | PYTHON | jupyter |

| demo_agent_multistep | PYTHON | jupyter |

| demo_rag_simple | PYTHON | jupyter |

| demo_reasoning_compare | PYTHON | jupyter |

| decoding | PYTHON | jupyter |

| efficiency | PYTHON | jupyter |

| embeddings | PYTHON | jupyter |

| ethics | PYTHON | jupyter |

| finetuning | PYTHON | jupyter |

| ngrams | PYTHON | jupyter |

| pretrained | PYTHON | jupyter |

| rnn-lstm | PYTHON | jupyter |

| scaling | PYTHON | jupyter |

| seq2seq | PYTHON | jupyter |

| tokenization | PYTHON | jupyter |

| transformers | PYTHON | jupyter |

| discovery_notebook | PYTHON | jupyter |

| word_embeddings_3d_msc | PYTHON | jupyter |

| ngrams_Alice_in_Wonderland | PYTHON | jupyter |

| shakespeare_sonnets_simple_bsc | PYTHON | jupyter |

| 1_simple_ngrams | PYTHON | jupyter |

| 2_word_embeddings | PYTHON | jupyter |

| 3_simple_neural_net | PYTHON | jupyter |

| 4_compare_NLP_methods | PYTHON | jupyter |

| 5_Tokens Journey Through a Transformer | PYTHON | jupyter |

| 6_Transformers in 3D A Visual Journey | PYTHON | jupyter |

| 7_Transformers_in_3d_simplified | PYTHON | jupyter |

| 8_How_Transformers_Learn_Training_in_3D | PYTHON | jupyter |

Datasets

This repository includes 33 dataset(s):

Dataset	Format	Size

| data | | 0.0 KB |

| moodle_topic_mapping.json | .json | 8.07 KB |

| manifest.json | .json | 15.98 KB |

| link_report_20251208_0935.csv | .csv | 34.57 KB |

| link_report_20251208_0935.json | .json | 62.65 KB |

| search.json | .json | 6.16 KB |

| action_items.json | .json | 52.46 KB |

| chart_catalog.json | .json | 136.8 KB |

| comprehensive_fix_log.json | .json | 1.68 KB |

| course_overview.json | .json | 17.15 KB |

| embeddings.json | .json | 191.95 KB |

| fix_log.json | .json | 13.38 KB |

| lstm_primer.json | .json | 88.36 KB |

| master_catalog.json | .json | 2760.1 KB |

| nn_primer.json | .json | 206.31 KB |

| sentiment.json | .json | 98.77 KB |

| summarization.json | .json | 129.25 KB |

| week00.json | .json | 94.15 KB |

| week01.json | .json | 139.94 KB |

| week02.json | .json | 122.56 KB |

| week03.json | .json | 119.77 KB |

| week04.json | .json | 155.54 KB |

| week05.json | .json | 133.55 KB |

| week06.json | .json | 209.05 KB |

| week07.json | .json | 133.77 KB |

| week08.json | .json | 32.24 KB |

| week09.json | .json | 174.14 KB |

| week10.json | .json | 186.1 KB |

| week11.json | .json | 193.21 KB |

| week12.json | .json | 126.63 KB |

| moodle_data.json | .json | 35.12 KB |

| layout_report.json | .json | 8.32 KB |

| verification_results.json | .json | 20.89 KB |

Reproducibility

This repository includes reproducibility tools:

Python requirements.txt
Conda environment.yml
Makefile for automation

Latest Release

Version: latest-lectures
Name: NLP Course - All Lectures
Published: 2025-12-12

Status

Issues: Enabled
Wiki: Disabled
Pages: Enabled

README

NLP Course 2025: From N-grams to Transformers

QuantLet-Compatible Course Materials

A comprehensive Natural Language Processing course covering statistical foundations through modern transformer architectures. Build ChatGPT from scratch!

Quick Start (3 Steps)

# 1. Clone the repository
git clone https://github.com/josterri/2025_NLP_Lectures.git
cd 2025_NLP_Lectures

# 2. Install dependencies
pip install -r requirements.txt

# 3. Start learning!
jupyter lab NLP_slides/week02_neural_lm/lab/week02_word_embeddings_lab.ipynb

What You'll Learn

This course takes you from foundational statistical methods to state-of-the-art neural architectures:

Weeks 1-2: Statistical language models and word embeddings (Word2Vec, GloVe)
Weeks 3-4: Sequential models (RNN/LSTM) and sequence-to-sequence with attention
Weeks 5-7: Transformers, BERT, GPT, and advanced architectures
Weeks 8-10: Tokenization, decoding strategies, and fine-tuning
Weeks 11-12: Efficiency optimization and ethical AI deployment

By the end, you'll build a working transformer from scratch and understand the architecture behind ChatGPT and Claude.

Course Structure

Core Materials (12 Weeks)

Each week includes: - Presentation: LaTeX/Beamer slides with optimal readability - Lab Notebook: Interactive Jupyter notebook with hands-on exercises - Handouts: Pre-class discovery exercises and post-class technical practice

Supplementary Modules

Neural Network Primer: Zero pre-knowledge intro to neural networks
LSTM Primer: Comprehensive deep dive into LSTM architecture (32 slides)
Embeddings Module: Standalone word embedding module with 3D visualizations

Total Content

60+ presentations (including versions and supplements)
12 interactive lab notebooks
40+ handout documents
100+ Python-generated figures
8 progressive visualization notebooks

Prerequisites

Required:
Python 3.8 or higher
Basic linear algebra (vectors, matrices)
Basic probability theory
Comfortable with Python programming
Helpful but not required:
PyTorch experience
Understanding of backpropagation
Machine learning fundamentals

New to neural networks? Start with our Neural Network Primer module before Week 2.

Installation

Option 1: pip (Recommended)

pip install -r requirements.txt

Option 2: conda

conda env create -f environment.yml
conda activate nlp2025

GPU Support

For GPU acceleration (recommended for Weeks 5+):

# CUDA 11.8
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# CUDA 12.1
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

See INSTALLATION.md for detailed setup instructions and troubleshooting.

Week-by-Week Guide

Full navigation with topics, prerequisites, and learning objectives: COURSE_INDEX.md

Week Highlights

Week	Topic	Key Concepts	Lab
1	Foundations	N-grams, perplexity, statistical LM	-
2	Word Embeddings	Word2Vec, GloVe, neural LM	Implement embeddings
3	RNN/LSTM	Sequential models, BPTT	Build LSTM from scratch
4	Seq2Seq	Attention mechanism, translation	Machine translation
5	Transformers	Self-attention, multi-head	Build transformer
6	Pre-trained	BERT, GPT, transfer learning	Fine-tune BERT
7	Advanced	T5, GPT-3, scaling laws	Experiment with GPT
8	Tokenization	BPE, WordPiece, SentencePiece	Implement tokenizer
9	Decoding	Beam, sampling, nucleus, contrastive	Compare 6 methods
10	Fine-tuning	LoRA, prompt engineering	Adapt models
11	Efficiency	Quantization, distillation	Optimize models
12	Ethics	Bias, fairness, safety	Measure bias

Quantlet Charts

All Python-generated visualizations follow the Quantlet standard format with: - Numbered folders (01_chart_name/, 02_chart_name/, etc.) - Self-contained Python scripts - Standard metainfo.txt with description, keywords, and usage

Final Lecture Charts

See FinalLecture/ for 8 Quantlet-formatted visualizations covering: - Vector database architecture - HNSW nearest neighbor search - RAG conditional probabilities - Hybrid search flow

Project Structure

├── FinalLecture/               # Quantlet-formatted charts (Final Lecture)
├── logo/                       # Quantlet branding
├── NLP_slides/
│   ├── week01_foundations/      # Week 1: Statistical LM
│   ├── week02_neural_lm/        # Week 2: Word embeddings
│   ├── week03_rnn/              # Week 3: RNN/LSTM/GRU
│   ├── ...                      # Weeks 4-12
│   ├── nn_primer/               # Neural network primer
│   ├── lstm_primer/             # LSTM deep dive
│   └── common/                  # Shared templates and utils
├── embeddings/                  # Standalone embeddings module
├── exercises/                   # Additional practice
├── figures/                     # Shared visualizations
├── requirements.txt             # Python dependencies
├── environment.yml              # Conda environment
└── COURSE_INDEX.md              # Full course navigation

Key Learning Milestones

✅ After Week 2: Understand and implement word embeddings
✅ After Week 3: Build RNN and LSTM from scratch
✅ After Week 5: Comprehend transformer architecture completely
✅ After Week 6: Fine-tune pre-trained models (BERT, GPT)
✅ After Week 9: Control text generation quality and diversity
✅ After Week 12: Deploy models responsibly with ethical considerations

Usage Examples

Run a Lab Notebook

# Start Jupyter Lab
jupyter lab

# Navigate to a week's lab folder
cd NLP_slides/week05_transformers/lab
jupyter notebook week05_transformer_lab.ipynb

Compile a Presentation

cd NLP_slides/week02_neural_lm/presentations
pdflatex week02_neural_lm.tex

Generate Figures

cd NLP_slides/week05_transformers/python
python generate_week05_optimal_charts.py

Testing the Course

Test all lab notebooks for execution:

python test_notebooks.py

This validates that all 12 lab notebooks execute correctly in your environment.

Course Delivery Options

Standard 12-Week Semester

One week per topic
Weekly labs and assignments
Suitable for undergraduate/graduate courses

Intensive 8-Week Course

Combine Weeks 1-2, skip some advanced topics
Accelerated pace for bootcamps
Focus on core transformer concepts

Self-Paced Learning

Progress at your own speed
Complete prerequisite modules first
Focus on labs and hands-on practice

Documentation

COURSE_INDEX.md - Complete week-by-week navigation
INSTALLATION.md - Detailed setup instructions
CLAUDE.md - Development guide and conventions
status.md - Project status and completion tracking
changelog.md - Change history

Support and Resources

Issues: Report problems at GitHub Issues
Prerequisites: Check the Neural Network Primer if you're new to deep learning
GPU Requirements: Most labs work on CPU; Weeks 5+ benefit from GPU

Contributing

Contributions are welcome! Areas for contribution: - Additional exercises and examples - Translations to other languages - MSc-level challenge problems - Bug fixes and improvements

License

This course is released under the MIT License. See LICENSE for details.

Acknowledgments

Course materials developed with pedagogical focus on: - Discovery-based learning - Concrete-to-abstract progression - Hands-on implementation - Real-world applications

Built with LaTeX/Beamer, Python, PyTorch, and Jupyter.

Citation

If you use these materials in your course or research, please cite:

@misc{nlp2025course,
  title={NLP Course 2025: From N-grams to Transformers},
  author={Joerg Osterrieder},
  year={2025},
  url={https://github.com/josterri/2025_NLP_Lectures}
}

Ready to start? Check INSTALLATION.md for setup, then dive into Week 2's word embeddings lab!

Questions? See COURSE_INDEX.md for complete navigation and prerequisites.

Description