In-Class Activity: PCA vs. EFA

Work in pairs — 20 minutes

Part A: Reading PCA Output (10 minutes)

No R needed. A consumer survey collected four variables: Spending, Income, Education, and Age. PCA was run for you. Use the output below to answer the questions.

Correlation matrix:

SpendingIncomeEducationAge
Spending1.000.850.720.15
Income0.851.000.680.20
Education0.720.681.000.10
Age0.150.200.101.00

Eigenvalues: 2.62  |  0.98  |  0.28  |  0.12

PC1 loadings: Spending = 0.58, Income = 0.57, Education = 0.54, Age = 0.18

a) Which two variables are most redundant? What is the correlation between them?

b) How many components would you keep according to Kaiser’s rule (eigenvalue > 1)?

c) What real-world concept does PC1 represent? Give it a name.

d) What percentage of total variance does PC1 capture? Show the calculation.

Show answers

a) Most redundant pair

Spending and Income have the highest correlation: $r = 0.85$. Knowing one gives you most of the information about the other.

b) Kaiser's rule

Only one eigenvalue exceeds 1: the first eigenvalue is 2.62. Kaiser's rule says keep 1 component.

c) Name for PC1

"Economic Status" or "Socioeconomic Standing." Spending (0.58), Income (0.57), and Education (0.54) all load heavily on PC1—these three variables collectively reflect how affluent a consumer is. Age (0.18) is almost unrelated to this component.

d) Variance explained by PC1

$2.62 / 4 = 0.655$, i.e., 65.5% of total variance. (There are 4 variables, so the total variance is 4.)

Part B: Now Think Like an EFA Analyst (10 minutes)

Same four variables, same data. A marketing researcher believes there are 2 hidden factors: Wealth and Life Stage. The unrotated 2-factor solution is shown below.

Unrotated factor loading matrix:

Factor 1Factor 2
Spending0.820.12
Income0.800.15
Education0.74−0.08
Age0.140.95

a) Assign each variable to the factor it loads most strongly on (loading > 0.4). Which factor is "Wealth"? Which is "Life Stage"?

b) Compute the communality of Spending. What does it mean?

c) Compare with your Part A answer. What can EFA tell you that PCA cannot?

d) Vote with your partner: would you recommend PCA or EFA for this marketing study? Why?

Show answers

a) Factor assignment

Factor 1 loads on Spending (0.82), Income (0.80), Education (0.74) → Wealth.
Factor 2 loads on Age (0.95) → Life Stage.
(Education’s small negative loading on Factor 2 is below the 0.4 threshold and can be ignored.)

b) Communality of Spending

$h^2 = 0.82^2 + 0.12^2 = 0.6724 + 0.0144 = 0.69$. This means 69% of Spending’s variance is explained by the two factors combined. The remaining 31% is unique (specific) variance not shared with other variables.

c) EFA vs. PCA insight

PCA collapsed everything into one component and made Age nearly invisible (loading 0.18). EFA separates the age dimension (Life Stage) cleanly from the wealth dimension. EFA reveals that consumers differ on two independent axes—something PCA missed because it only retained one component under Kaiser’s rule.

d) Recommendation

For this marketing study, EFA is more appropriate. The researcher has a theoretical model (two hidden factors) and wants to confirm whether data supports it. PCA is better when the goal is purely to reduce dimensions without an interpretive theory about latent causes.

Key takeaway: PCA compresses data into fewer dimensions. EFA uncovers the hidden structure. Same data, different questions!
‹ Back to Lesson 3
Statistical Data Analysis  |  Digital AI Finance  |  BSc Data Science  |  © Joerg Osterrieder 2025–2026