Skip to content

ml-for-innovation-phd-seminar

ML for Innovation Research: PhD/DBA Seminar - 8 modules with Beamer slides, notebooks, and interactive quizzes

View on GitHub


Information

Property Value
Language Python
Stars 0
Forks 0
Watchers 0
Open Issues 1
License No License
Created 2026-03-07
Last Updated 2026-03-12
Last Push 2026-03-18
Contributors 1
Default Branch main
Visibility private

Notebooks

This repository contains 6 notebook(s):

Notebook Language Type

| 01_data_exploration | PYTHON | jupyter |

| 02_clustering | PYTHON | jupyter |

| 03_nlp_sentiment | PYTHON | jupyter |

| 04_topic_modeling | PYTHON | jupyter |

| 06_classification | PYTHON | jupyter |

| 07_structured_output | PYTHON | jupyter |

Datasets

This repository includes 25 dataset(s):

Dataset Format Size

| questions.json | .json | 3.28 KB |

| questions.json | .json | 3.22 KB |

| questions.json | .json | 3.19 KB |

| questions.json | .json | 3.13 KB |

| questions.json | .json | 2.81 KB |

| questions.json | .json | 4.29 KB |

| questions.json | .json | 3.52 KB |

| questions.json | .json | 3.02 KB |

| charts.json | .json | 27.44 KB |

| data | | 0.0 KB |

| cached_responses | | 0.0 KB |

| README.md | .md | 0.75 KB |

| structured_output_samples.json | .json | 10.08 KB |

| derived | | 0.0 KB |

| cluster_labels.csv | .csv | 42.46 KB |

| cross_references.csv | .csv | 23.49 KB |

| model_predictions.csv | .csv | 26.27 KB |

| sentiment_scores.csv | .csv | 17.59 KB |

| shap_values.csv | .csv | 36.65 KB |

| text_features.csv | .csv | 15.42 KB |

| topic_model.csv | .csv | 51.7 KB |

| topic_words.json | .json | 3.72 KB |

| generate_dataset.py | .py | 12.05 KB |

| generate_derived.py | .py | 12.21 KB |

| swiss_innovation_survey.csv | .csv | 248.75 KB |

Reproducibility

This repository includes reproducibility tools:

  • Python requirements.txt

Status

  • Issues: Enabled
  • Wiki: Enabled
  • Pages: Enabled

README

ML for Innovation Research: PhD/DBA Seminar

A 2x3-hour hands-on seminar teaching DBA/PhD students how to apply machine learning as a research tool for innovation studies.

Course Structure

Module Topic Key Techniques
01 Opening: ML for Innovation Research ML paradigms, workflow, dataset intro
02 Clustering & PCA K-Means, PCA/UMAP, innovator archetypes
03 NLP & Sentiment Analysis VADER sentiment, TF-IDF, text analysis
04 Synthesis & Review LDA topic modeling, cross-tabulation
05 Recap & Supervised Learning Session 1 recap, features, labels
06 Classification Random Forest, Logistic Regression, ROC/AUC
07 Generative AI LLMs, prompt engineering, structured output
08 Complete Toolkit & Closing Decision framework, thesis patterns

Folder Structure

ml-for-innovation-phd-seminar/
├── 01_opening/                  # Module folders with slides, notebooks, charts
│   ├── 01_opening.tex           # Standalone Beamer slides
│   ├── 01_opening_quiz.tex      # Quiz slides
│   ├── 01_data_exploration.ipynb # Demo notebook
│   ├── 01_innovation_data_overview/
│   │   ├── chart.py             # Standalone chart generator
│   │   └── chart.pdf            # Generated chart
│   └── questions.json           # Quiz questions
├── 02_clustering/ ... 08_toolkit/
├── cheatsheets/                 # One-page reference sheets per module
├── quiz/                        # Interactive HTML quizzes
├── js/                          # Shared quiz JavaScript engine
├── data/                        # Dataset and cached LLM responses
├── handouts/                    # Toolkit card and reading list
├── template_beamer_final.tex    # Shared Beamer preamble
├── notation.tex                 # ML notation shortcuts
├── references.bib               # Bibliography entries
├── charts.json                  # Chart registry
├── index.html                   # Course web page
├── compile.py                   # Build script
├── extract_charts.py            # Regenerate all charts
├── compile_quizzes.py           # Compile all quiz PDFs
└── generate_thumbnails.py       # Generate PDF thumbnails

Setup

pip install -r requirements.txt
python data/generate_dataset.py        # regenerate dataset (optional)
python extract_charts.py               # generate all chart PDFs
python compile.py --all                # compile everything

Building

python compile.py                  # module slides only
python compile.py --charts         # regenerate charts + compile slides
python compile.py --quizzes        # compile quiz PDFs
python compile.py --cheatsheets    # compile cheatsheet PDFs
python compile.py --all            # everything

Dataset

data/swiss_innovation_survey.csv -- 500 synthetic innovation projects with 4 latent innovator archetypes embedded for clustering exercises.

Requirements

  • Python 3.9+
  • LaTeX with Beamer (TeX Live or MiKTeX)
  • No API keys needed (structured output demo uses cached responses)