statistical-data-analysis
BSc Statistical Data Analysis - Complete 5-lesson course covering regression, hypothesis testing, PCA/EFA, clustering, and time series analysis with Beamer presentations and Python visualizations
Information
| Property | Value |
|---|---|
| Language | HTML |
| Stars | 0 |
| Forks | 0 |
| Watchers | 0 |
| Open Issues | 0 |
| License | MIT License |
| Created | 2025-12-16 |
| Last Updated | 2026-03-16 |
| Last Push | 2026-03-16 |
| Contributors | 1 |
| Default Branch | main |
| Visibility | private |
Notebooks
This repository contains 1 notebook(s):
| Notebook | Language | Type |
|---|---|---|
| semester_thesis | PYTHON | jupyter |
Datasets
This repository includes 25 dataset(s):
| Dataset | Format | Size |
|---|---|---|
| charts.json | .json | 48.01 KB |
| questions.json | .json | 14.71 KB |
| datasets | | 0.0 KB |
| README.md | .md | 3.59 KB |
| agricultural_experiment.csv | .csv | 3.81 KB |
| clinical_trial.csv | .csv | 2.18 KB |
| drug_dosage_study.csv | .csv | 3.42 KB |
| education_intervention.csv | .csv | 0.6 KB |
| employee_satisfaction.csv | .csv | 3.41 KB |
| environmental_study.csv | .csv | 6.36 KB |
| manufacturing_quality.csv | .csv | 0.64 KB |
| marketing_campaigns.csv | .csv | 4.03 KB |
| medical_treatment.csv | .csv | 2.32 KB |
| reaction_time_study.csv | .csv | 2.99 KB |
| website_testing.csv | .csv | 71.91 KB |
| questions.json | .json | 14.57 KB |
| questions.json | .json | 16.95 KB |
| questions.json | .json | 15.57 KB |
| questions.json | .json | 18.0 KB |
| bullet_reduction_results.json | .json | 2.31 KB |
| deep_slide_review.json | .json | 436.25 KB |
| slide_analysis.json | .json | 461.55 KB |
| slide_improvement_results.json | .json | 23.23 KB |
| slide_improvement_tasks.json | .json | 126.56 KB |
| ultra_deep_review.json | .json | 618.07 KB |
Reproducibility
This repository includes reproducibility tools:
- Python requirements.txt
Status
- Issues: Enabled
- Wiki: Enabled
- Pages: Enabled
README
Statistical Data Analysis
Teaching materials for a BSc-level course in Statistical Data Analysis, covering regression, hypothesis testing, dimensionality reduction, clustering, and time series analysis.
Course Overview
This repository contains complete lecture materials including Beamer slide decks, Python visualization scripts, R code examples, interactive quizzes, and practice datasets. Materials emphasize practical applications with real-world examples and hands-on exercises.
Course Structure
| Lesson | Topic | Slides | Charts | Quiz |
|---|---|---|---|---|
| 1 | Linear Regression & Survival Analysis | 120 frames (2 decks) | 110 | 20 questions |
| 2 | Hypothesis Testing | 25 frames | 39 | 20 questions |
| 3 | PCA & Exploratory Factor Analysis | 38 frames | 43 | 20 questions |
| 4 | Cluster Analysis | 106 frames | 32 | 20 questions |
| 5 | Time Series Analysis | 53 frames | 40 | 20 questions |
Technical Details
- Format: LaTeX Beamer presentations (Madrid theme, 8pt, 16:9 aspect ratio)
- Charts: Python (matplotlib, seaborn) — PNG for Lessons 1–4, standalone
chart.py→ PDF for Lesson 5 - Code examples: R and Python throughout all lessons
- Quizzes: 100 multiple-choice questions total — interactive HTML + LaTeX PDF per lesson
- Practice datasets: 11 CSV files for hypothesis testing exercises
Folder Structure
statistical-data-analysis/
├── lesson1/ # Regression & Survival Analysis
│ ├── linear_regression_complete.tex
│ ├── survival_km_slides.tex
│ ├── images/ # 110 generated PNG charts
│ ├── questions.json # 20 MC questions
│ ├── lesson1_quiz.tex
│ ├── generate_*.py # Chart generation scripts
│ └── regression_examples.R
├── lesson2_hypothesis/ # Hypothesis Testing
│ ├── hypothesis_testing.tex
│ ├── images/ # 39 generated PNG charts
│ ├── datasets/ # 11 practice CSV files
│ ├── questions.json
│ └── lesson2_hypothesis_quiz.tex
├── lesson3_pca_efa/ # PCA & EFA
│ ├── pca_efa_komplett.tex
│ ├── images/ # 43 generated PNG charts
│ ├── questions.json
│ └── lesson3_pca_efa_quiz.tex
├── lesson4_clustering/ # Cluster Analysis
│ ├── cluster_analysis_with_distance_slides.tex
│ ├── images/ # 32 generated PNG charts
│ ├── questions.json
│ └── lesson4_clustering_quiz.tex
├── lesson5_timeseries/ # Time Series Analysis
│ ├── time_series_analysis.tex
│ ├── [40 chart folders]/ # Each with chart.py → chart.pdf
│ ├── questions.json
│ └── lesson5_timeseries_quiz.tex
├── quiz/ # Interactive HTML quizzes
│ ├── quiz_L01.html ... quiz_L05.html
├── utils/ # Shared utilities
│ └── generate_quiz.py
├── reviews/ # Quality review tools
├── docs/ # GitHub Pages dashboard
│ └── index.html
├── .github/workflows/ # CI/CD
│ ├── compile_slides.yml
│ ├── validate_charts.yml
│ └── validate_quizzes.yml
├── requirements.txt # Python dependencies
├── notation.tex # Shared LaTeX notation commands
├── notation_style_guide.md # Notation conventions
├── charts.json # Chart inventory (264 total)
├── CHANGELOG.md
├── CONTRIBUTING.md
└── template_beamer_final.tex # Beamer template
Requirements
Python 3.12+
Key packages: matplotlib, numpy, scipy, seaborn, pandas, scikit-learn, lifelines, statsmodels, arch
LaTeX
A LaTeX distribution with Beamer support (e.g., MiKTeX on Windows, TeX Live on Linux/macOS).
R
Packages: ggplot2, survival, cluster, factoextra, forecast, tseries
Quick Start
# Clone the repository
git clone https://github.com/Digital-AI-Finance/statistical-data-analysis.git
cd statistical-data-analysis
# Install Python dependencies
pip install -r requirements.txt
# Compile a slide deck
cd lesson1
pdflatex linear_regression_complete.tex
# Generate charts for Lesson 5
cd ../lesson5_timeseries
python ts_decomposition/chart.py
# Run quiz generation
cd ..
python generate_all_quizzes.py
Interactive Quizzes
Each lesson includes 20 multiple-choice questions available in two formats: - HTML (quiz/quiz_L01.html – quiz_L05.html): Interactive browser-based with KaTeX math rendering - PDF (lesson/lesson_quiz.tex): LaTeX Beamer with pause-reveal answers
Course Website
https://digital-ai-finance.github.io/statistical-data-analysis/
Contributing
See CONTRIBUTING.md for guidelines.
License
Educational use only.
© Joerg Osterrieder 2025–2026