Classification

Level: Intermediate Duration: 90 minutes Download PDF

Classification

Categorizing data into predefined classes using tree-based methods.

Learning Outcomes

By completing this topic, you will:

  • Build and interpret decision trees
  • Apply random forests for robust predictions
  • Handle imbalanced classification problems
  • Evaluate classifiers with appropriate metrics

Visual Guides

Decision Boundary
Decision Boundary
Confusion Matrix
Confusion Matrix
Decision Tree
Decision Tree

Prerequisites

  • Supervised Learning concepts
  • Understanding of entropy and information gain
  • Basic probability concepts

Key Concepts

Decision Trees

Interpretable rule-based classifiers:

  • Split data based on feature thresholds
  • Information gain guides split selection
  • Pruning prevents overfitting

Random Forests

Ensemble of decision trees:

  • Bootstrap aggregating (bagging)
  • Random feature selection
  • Reduced variance, improved accuracy

Evaluation Metrics

  • Accuracy: Overall correctness
  • Precision: True positives / predicted positives
  • Recall: True positives / actual positives
  • F1-score: Harmonic mean of precision and recall
  • ROC-AUC: Discrimination ability

When to Use

MethodBest For
Decision TreeInterpretability, simple rules
Random ForestAccuracy, feature importance
Gradient BoostingMaximum performance

Common Pitfalls

  • Overfitting with deep trees
  • Ignoring class imbalance
  • Using accuracy on skewed datasets
  • Not tuning hyperparameters
  • Forgetting to scale features for some algorithms

(c) Joerg Osterrieder 2025