ml-for-innovation-phd-seminar
ML for Innovation Research: PhD/DBA Seminar - 8 modules with Beamer slides, notebooks, and interactive quizzes
Information
| Property | Value |
|---|---|
| Language | Python |
| Stars | 0 |
| Forks | 0 |
| Watchers | 0 |
| Open Issues | 1 |
| License | No License |
| Created | 2026-03-07 |
| Last Updated | 2026-03-12 |
| Last Push | 2026-03-18 |
| Contributors | 1 |
| Default Branch | main |
| Visibility | private |
Notebooks
This repository contains 6 notebook(s):
| Notebook | Language | Type |
|---|---|---|
| 01_data_exploration | PYTHON | jupyter |
| 02_clustering | PYTHON | jupyter |
| 03_nlp_sentiment | PYTHON | jupyter |
| 04_topic_modeling | PYTHON | jupyter |
| 06_classification | PYTHON | jupyter |
| 07_structured_output | PYTHON | jupyter |
Datasets
This repository includes 25 dataset(s):
| Dataset | Format | Size |
|---|---|---|
| questions.json | .json | 3.28 KB |
| questions.json | .json | 3.22 KB |
| questions.json | .json | 3.19 KB |
| questions.json | .json | 3.13 KB |
| questions.json | .json | 2.81 KB |
| questions.json | .json | 4.29 KB |
| questions.json | .json | 3.52 KB |
| questions.json | .json | 3.02 KB |
| charts.json | .json | 27.44 KB |
| data | | 0.0 KB |
| cached_responses | | 0.0 KB |
| README.md | .md | 0.75 KB |
| structured_output_samples.json | .json | 10.08 KB |
| derived | | 0.0 KB |
| cluster_labels.csv | .csv | 42.46 KB |
| cross_references.csv | .csv | 23.49 KB |
| model_predictions.csv | .csv | 26.27 KB |
| sentiment_scores.csv | .csv | 17.59 KB |
| shap_values.csv | .csv | 36.65 KB |
| text_features.csv | .csv | 15.42 KB |
| topic_model.csv | .csv | 51.7 KB |
| topic_words.json | .json | 3.72 KB |
| generate_dataset.py | .py | 12.05 KB |
| generate_derived.py | .py | 12.21 KB |
| swiss_innovation_survey.csv | .csv | 248.75 KB |
Reproducibility
This repository includes reproducibility tools:
- Python requirements.txt
Status
- Issues: Enabled
- Wiki: Enabled
- Pages: Enabled
README
ML for Innovation Research: PhD/DBA Seminar
A 2x3-hour hands-on seminar teaching DBA/PhD students how to apply machine learning as a research tool for innovation studies.
Course Structure
| Module | Topic | Key Techniques |
|---|---|---|
| 01 | Opening: ML for Innovation Research | ML paradigms, workflow, dataset intro |
| 02 | Clustering & PCA | K-Means, PCA/UMAP, innovator archetypes |
| 03 | NLP & Sentiment Analysis | VADER sentiment, TF-IDF, text analysis |
| 04 | Synthesis & Review | LDA topic modeling, cross-tabulation |
| 05 | Recap & Supervised Learning | Session 1 recap, features, labels |
| 06 | Classification | Random Forest, Logistic Regression, ROC/AUC |
| 07 | Generative AI | LLMs, prompt engineering, structured output |
| 08 | Complete Toolkit & Closing | Decision framework, thesis patterns |
Folder Structure
ml-for-innovation-phd-seminar/
├── 01_opening/ # Module folders with slides, notebooks, charts
│ ├── 01_opening.tex # Standalone Beamer slides
│ ├── 01_opening_quiz.tex # Quiz slides
│ ├── 01_data_exploration.ipynb # Demo notebook
│ ├── 01_innovation_data_overview/
│ │ ├── chart.py # Standalone chart generator
│ │ └── chart.pdf # Generated chart
│ └── questions.json # Quiz questions
├── 02_clustering/ ... 08_toolkit/
├── cheatsheets/ # One-page reference sheets per module
├── quiz/ # Interactive HTML quizzes
├── js/ # Shared quiz JavaScript engine
├── data/ # Dataset and cached LLM responses
├── handouts/ # Toolkit card and reading list
├── template_beamer_final.tex # Shared Beamer preamble
├── notation.tex # ML notation shortcuts
├── references.bib # Bibliography entries
├── charts.json # Chart registry
├── index.html # Course web page
├── compile.py # Build script
├── extract_charts.py # Regenerate all charts
├── compile_quizzes.py # Compile all quiz PDFs
└── generate_thumbnails.py # Generate PDF thumbnails
Setup
pip install -r requirements.txt
python data/generate_dataset.py # regenerate dataset (optional)
python extract_charts.py # generate all chart PDFs
python compile.py --all # compile everything
Building
python compile.py # module slides only
python compile.py --charts # regenerate charts + compile slides
python compile.py --quizzes # compile quiz PDFs
python compile.py --cheatsheets # compile cheatsheet PDFs
python compile.py --all # everything
Dataset
data/swiss_innovation_survey.csv -- 500 synthetic innovation projects with 4 latent innovator archetypes embedded for clustering exercises.
Requirements
- Python 3.9+
- LaTeX with Beamer (TeX Live or MiKTeX)
- No API keys needed (structured output demo uses cached responses)