Clustering

Level: Intermediate Duration: 90 minutes Download PDF

Clustering

Discovering natural groupings in data without predefined labels.

Learning Outcomes

By completing this topic, you will:

  • Apply K-means, DBSCAN, and hierarchical clustering
  • Choose the optimal number of clusters
  • Interpret and validate cluster results
  • Create customer personas from segments

Visual Guides

K-Means Algorithm
K-Means Algorithm
Elbow Method
Elbow Method
Silhouette Scores
Silhouette Scores

Prerequisites

  • Unsupervised Learning concepts
  • Distance metrics (Euclidean, Manhattan)
  • Feature scaling techniques

Key Concepts

K-Means Clustering

Partition data into K spherical clusters:

  1. Initialize K centroids
  2. Assign points to nearest centroid
  3. Update centroids as cluster means
  4. Repeat until convergence

Choosing K: Elbow method, silhouette score

DBSCAN

Density-based clustering for arbitrary shapes:

  • Epsilon: Neighborhood radius
  • MinPts: Minimum points to form a cluster
  • Automatically detects noise points

Hierarchical Clustering

Build nested cluster hierarchy:

  • Agglomerative (bottom-up)
  • Divisive (top-down)
  • Dendrograms for visualization

When to Use

AlgorithmBest For
K-meansSpherical clusters, known K
DBSCANArbitrary shapes, noise detection
HierarchicalExploring cluster structure

Common Pitfalls

  • Not scaling features before clustering
  • Choosing K based on convenience, not data
  • Ignoring cluster interpretability
  • Using K-means on non-spherical data
  • Forgetting to validate cluster stability

(c) Joerg Osterrieder 2025