Word Embeddings
From Words to Vectors
35 SLIDES Part 1: Foundations
Words That Do Algebra: How can "king - man + woman = queen" work mathematically? The answer reveals that word embeddings capture meaning geometrically - similar words cluster together, and relationships become vector directions.
Prerequisites
- Week 1: N-gram language models and probability basics
- Linear algebra basics (vectors, dot products)
- Understanding of neural network fundamentals helpful but not required
Overview
Transform words into dense vectors that capture semantic meaning. Word2Vec, GloVe, and vector arithmetic.
Learning Objectives
- Explain why sparse one-hot vectors are problematic for NLP
- Understand the distributional hypothesis ("words known by company they keep")
- Implement Skip-gram and CBOW conceptually
- Perform word arithmetic (king - man + woman = queen)
- Visualize and interpret word embedding spaces
Key Topics
Word2Vec
Skip-gram
Negative sampling
Word analogies
Key Concepts
One-hot encodingSparse vectors with single 1, rest 0s (inefficient)
Word embeddingsDense, low-dimensional word representations
Distributional hypothesisSimilar words appear in similar contexts
Skip-gramPredict context words from center word
CBOWPredict center word from context (Continuous Bag of Words)
Cosine similarityMeasure of vector angle (semantic similarity)
Key Visualizations
Word As Vector Concept
Semantic Space 2D
Skipgram Training Steps
Word2Vec Architectures