Pre-Class Discovery: PCA & t-SNE | Methods & Algorithms

Activity 1

"Too Many Numbers"

Stock	Return (%)	Volatility (%)	Volume (M)	P/E	Beta
AAPL	12	22	80	28	1.2
MSFT	15	25	65	32	1.3
JPM	8	18	45	12	1.1
GS	10	28	30	10	1.4
JNJ	6	12	25	18	0.7
PFE	7	15	35	15	0.8

Your Task

Can you visualize all 5 dimensions on paper?
Which features seem to move together (correlated)?
If you could keep only 2 summary features, what would they capture?

Reveal Solution

With 5 features, direct visualization is impossible. But notice: Return and Beta move together, Volatility and Beta move together. PCA finds the best linear combinations that capture the most variance. The first principal component might be a "risk" factor, the second a "size" factor.

Activity 2

"Find the Direction of Maximum Spread"

2D data cloud with principal component arrows showing directions of maximum variance

Your Task

If you had to summarize this 2D cloud with one line, where would you draw it?
Why along the widest spread?
What does the second arrow capture?

Reveal Solution

The first principal component (PC1) points in the direction of maximum variance. Projecting onto this direction loses the least information. PC2 is perpendicular to PC1 and captures the remaining variance. Together they form a new coordinate system aligned with the data's natural axes.

Activity 3

"How Many Components?"

Scree plot showing explained variance ratio for each principal component

Your Task

How much variance does the first component capture?
At which component does adding more stop helping much?
If you keep 3 components, roughly what percentage of total variance do you retain?

Reveal Solution

The scree plot shows explained variance per component. Look for the "elbow" -- where the curve flattens. Components after the elbow add little. A common rule: keep enough components to explain 90-95% of total variance. This is dimensionality reduction -- fewer features, nearly the same information.

Activity 4

"Compress and Reconstruct"

Original data versus PCA reconstruction showing information loss with fewer components

Your Task

What information was lost when reducing to fewer components?
Is the reconstruction close to the original?
When is a lossy approximation acceptable?

Reveal Solution

PCA reconstruction: $\hat{X} = X_{\text{reduced}} \cdot W^T$. With fewer components, fine details are lost but major patterns are preserved. Acceptable when: the lost variance is noise, or you need speed/storage savings. This is similar to JPEG compression -- lose some detail, keep the essence.

Activity 5

"The Map of Similarities"

t-SNE visualizations of the same data with different perplexity settings

Your Task

Which points are clustered together?
Does the distance between clusters have meaning?
The same data is shown 3 times with different settings -- what changed?

Reveal Solution

t-SNE is a nonlinear method that preserves local neighborhoods -- similar points stay close. But unlike PCA, distances between distant clusters are NOT meaningful. The perplexity parameter controls how many neighbors each point considers: low perplexity = tight clusters, high perplexity = more global structure.

Activity 6

"PCA vs. t-SNE: Which to Use?"

Side-by-side comparison of PCA and t-SNE projections of the same dataset

Your Task

Which visualization preserves global distances better?
Which shows clusters more clearly?
Can you apply t-SNE to new data without recomputing everything?

Reveal Solution

PCA: linear, fast, invertible, preserves global structure -- use for feature reduction, preprocessing, or when you need to transform new data. t-SNE: nonlinear, slow, not invertible, reveals clusters -- use for visualization only. You cannot apply a t-SNE mapping to new points without rerunning on the full dataset.