Skip to content

statistical-data-analysis

BSc Statistical Data Analysis - Complete 5-lesson course covering regression, hypothesis testing, PCA/EFA, clustering, and time series analysis with Beamer presentations and Python visualizations

View on GitHub


Information

Property Value
Language HTML
Stars 0
Forks 0
Watchers 0
Open Issues 0
License MIT License
Created 2025-12-16
Last Updated 2026-03-16
Last Push 2026-03-16
Contributors 1
Default Branch main
Visibility private

Notebooks

This repository contains 1 notebook(s):

Notebook Language Type

| semester_thesis | PYTHON | jupyter |

Datasets

This repository includes 25 dataset(s):

Dataset Format Size

| charts.json | .json | 48.01 KB |

| questions.json | .json | 14.71 KB |

| datasets | | 0.0 KB |

| README.md | .md | 3.59 KB |

| agricultural_experiment.csv | .csv | 3.81 KB |

| clinical_trial.csv | .csv | 2.18 KB |

| drug_dosage_study.csv | .csv | 3.42 KB |

| education_intervention.csv | .csv | 0.6 KB |

| employee_satisfaction.csv | .csv | 3.41 KB |

| environmental_study.csv | .csv | 6.36 KB |

| manufacturing_quality.csv | .csv | 0.64 KB |

| marketing_campaigns.csv | .csv | 4.03 KB |

| medical_treatment.csv | .csv | 2.32 KB |

| reaction_time_study.csv | .csv | 2.99 KB |

| website_testing.csv | .csv | 71.91 KB |

| questions.json | .json | 14.57 KB |

| questions.json | .json | 16.95 KB |

| questions.json | .json | 15.57 KB |

| questions.json | .json | 18.0 KB |

| bullet_reduction_results.json | .json | 2.31 KB |

| deep_slide_review.json | .json | 436.25 KB |

| slide_analysis.json | .json | 461.55 KB |

| slide_improvement_results.json | .json | 23.23 KB |

| slide_improvement_tasks.json | .json | 126.56 KB |

| ultra_deep_review.json | .json | 618.07 KB |

Reproducibility

This repository includes reproducibility tools:

  • Python requirements.txt

Status

  • Issues: Enabled
  • Wiki: Enabled
  • Pages: Enabled

README

Statistical Data Analysis

Teaching materials for a BSc-level course in Statistical Data Analysis, covering regression, hypothesis testing, dimensionality reduction, clustering, and time series analysis.

Course Overview

This repository contains complete lecture materials including Beamer slide decks, Python visualization scripts, R code examples, interactive quizzes, and practice datasets. Materials emphasize practical applications with real-world examples and hands-on exercises.

Course Structure

Lesson Topic Slides Charts Quiz
1 Linear Regression & Survival Analysis 120 frames (2 decks) 110 20 questions
2 Hypothesis Testing 25 frames 39 20 questions
3 PCA & Exploratory Factor Analysis 38 frames 43 20 questions
4 Cluster Analysis 106 frames 32 20 questions
5 Time Series Analysis 53 frames 40 20 questions

Technical Details

  • Format: LaTeX Beamer presentations (Madrid theme, 8pt, 16:9 aspect ratio)
  • Charts: Python (matplotlib, seaborn) — PNG for Lessons 1–4, standalone chart.py → PDF for Lesson 5
  • Code examples: R and Python throughout all lessons
  • Quizzes: 100 multiple-choice questions total — interactive HTML + LaTeX PDF per lesson
  • Practice datasets: 11 CSV files for hypothesis testing exercises

Folder Structure

statistical-data-analysis/
├── lesson1/                          # Regression & Survival Analysis
│   ├── linear_regression_complete.tex
│   ├── survival_km_slides.tex
│   ├── images/                       # 110 generated PNG charts
│   ├── questions.json                # 20 MC questions
│   ├── lesson1_quiz.tex
│   ├── generate_*.py                 # Chart generation scripts
│   └── regression_examples.R
├── lesson2_hypothesis/               # Hypothesis Testing
│   ├── hypothesis_testing.tex
│   ├── images/                       # 39 generated PNG charts
│   ├── datasets/                     # 11 practice CSV files
│   ├── questions.json
│   └── lesson2_hypothesis_quiz.tex
├── lesson3_pca_efa/                  # PCA & EFA
│   ├── pca_efa_komplett.tex
│   ├── images/                       # 43 generated PNG charts
│   ├── questions.json
│   └── lesson3_pca_efa_quiz.tex
├── lesson4_clustering/               # Cluster Analysis
│   ├── cluster_analysis_with_distance_slides.tex
│   ├── images/                       # 32 generated PNG charts
│   ├── questions.json
│   └── lesson4_clustering_quiz.tex
├── lesson5_timeseries/               # Time Series Analysis
│   ├── time_series_analysis.tex
│   ├── [40 chart folders]/           # Each with chart.py → chart.pdf
│   ├── questions.json
│   └── lesson5_timeseries_quiz.tex
├── quiz/                             # Interactive HTML quizzes
│   ├── quiz_L01.html ... quiz_L05.html
├── utils/                            # Shared utilities
│   └── generate_quiz.py
├── reviews/                          # Quality review tools
├── docs/                             # GitHub Pages dashboard
│   └── index.html
├── .github/workflows/                # CI/CD
│   ├── compile_slides.yml
│   ├── validate_charts.yml
│   └── validate_quizzes.yml
├── requirements.txt                  # Python dependencies
├── notation.tex                      # Shared LaTeX notation commands
├── notation_style_guide.md           # Notation conventions
├── charts.json                       # Chart inventory (264 total)
├── CHANGELOG.md
├── CONTRIBUTING.md
└── template_beamer_final.tex         # Beamer template

Requirements

Python 3.12+

pip install -r requirements.txt

Key packages: matplotlib, numpy, scipy, seaborn, pandas, scikit-learn, lifelines, statsmodels, arch

LaTeX

A LaTeX distribution with Beamer support (e.g., MiKTeX on Windows, TeX Live on Linux/macOS).

R

Packages: ggplot2, survival, cluster, factoextra, forecast, tseries

Quick Start

# Clone the repository
git clone https://github.com/Digital-AI-Finance/statistical-data-analysis.git
cd statistical-data-analysis

# Install Python dependencies
pip install -r requirements.txt

# Compile a slide deck
cd lesson1
pdflatex linear_regression_complete.tex

# Generate charts for Lesson 5
cd ../lesson5_timeseries
python ts_decomposition/chart.py

# Run quiz generation
cd ..
python generate_all_quizzes.py

Interactive Quizzes

Each lesson includes 20 multiple-choice questions available in two formats: - HTML (quiz/quiz_L01.html – quiz_L05.html): Interactive browser-based with KaTeX math rendering - PDF (lesson/lesson_quiz.tex): LaTeX Beamer with pause-reveal answers

Course Website

https://digital-ai-finance.github.io/statistical-data-analysis/

Contributing

See CONTRIBUTING.md for guidelines.

License

Educational use only.

© Joerg Osterrieder 2025–2026