Pre-Class Discovery: Embeddings & RL

Activity 1

"Words as Numbers"

Financial Term
stock
bond
interest
dividend
inflation
risk
return
portfolio

Your Task

Which pairs are most similar in meaning? Rank the top 3 pairs.
How would you assign each word a score from 0-10 for "riskiness"?
Could you add a second dimension -- say "income-generating" -- and plot the words on a 2D map?

Reveal Solution

You just created a 2D word embedding by hand! Real embeddings (Word2Vec, GloVe) learn 100-300 dimensions from billions of text examples. Similar words end up with similar vectors: $\cos(\vec{\text{stock}}, \vec{\text{bond}}) > \cos(\vec{\text{stock}}, \vec{\text{inflation}})$.

Activity 2

"The Meaning Map"

2D projection of word embedding space showing financial terms clustered by semantic similarity

Your Task

Which words are close together in the map?
Does their proximity match your intuition from Activity 1?
Famous example: king - man + woman = queen. What financial analogy could work?

Reveal Solution

The embedding space captures semantic relationships as geometric relationships. Possible financial analogy: stock - risk + safety = bond. These arithmetic properties emerge automatically from training on large text corpora -- the model learns meaning from context.

Activity 3

"How Similar?"

Cosine similarity heatmap between financial term embeddings

Your Task

Which pair has the highest similarity score?
Are any words surprisingly similar or different compared to your expectations?
How might a bank use these similarity scores?

Reveal Solution

Cosine similarity measures the angle between vectors: $\cos(\theta) = \frac{\vec{a} \cdot \vec{b}}{|\vec{a}||\vec{b}|}$. Range: -1 (opposite) to +1 (identical direction). Banks use embeddings for: document search, customer query matching, sentiment analysis, fraud detection (unusual language patterns).

Activity 4

"The Learning Loop"

Reinforcement learning loop showing agent, environment, state, action, and reward

Your Task

A robot trader can buy, sell, or hold. It earns +1 for profit, -1 for loss. What should it learn?
How is this different from supervised learning (where someone tells you the right answer)?
What role does the "state" play?

Reveal Solution

In Reinforcement Learning, the agent learns a policy (state to action mapping) by trial and error. Unlike supervised learning, there's no labeled dataset -- the agent discovers good actions through rewards. The state (e.g., current prices, portfolio, market indicators) determines which action is best.

Activity 5

"Navigate the Grid"

Q-learning grid world showing state values and optimal path

Your Task

The agent starts at the top-left and wants to reach the goal. What path would you take?
Some cells have penalties -- how do you avoid them?
If you stored a "goodness score" for each cell+direction, how would you update it after each move?

Reveal Solution

This is Q-learning. The Q-table stores $Q(s, a)$ = expected total reward from taking action $a$ in state $s$. Update rule: $Q(s,a) \leftarrow Q(s,a) + \alpha[r + \gamma \max_{a'} Q(s',a') - Q(s,a)]$. Over many episodes, Q-values converge to optimal, and the agent learns the best path.

Activity 6

"Learning to Trade"

Training reward curves showing agent learning over episodes

Your Task

The agent starts badly -- why?
When does performance start improving?
Why does the curve plateau?
What might cause the occasional dips?

Reveal Solution

Early episodes: the agent explores randomly (low reward). As it learns Q-values, it starts exploiting good strategies (rising reward). The plateau means the policy has converged. Dips come from exploration -- the agent occasionally tries random actions (epsilon-greedy) to discover potentially better strategies.