Word Embeddings

From Words to Vectors

35 SLIDES Part 1: Foundations

?
Words That Do Algebra: How can "king - man + woman = queen" work mathematically? The answer reveals that word embeddings capture meaning geometrically - similar words cluster together, and relationships become vector directions.

Prerequisites

  • Week 1: N-gram language models and probability basics
  • Linear algebra basics (vectors, dot products)
  • Understanding of neural network fundamentals helpful but not required

Overview

Transform words into dense vectors that capture semantic meaning. Word2Vec, GloVe, and vector arithmetic.

Learning Objectives

  • Explain why sparse one-hot vectors are problematic for NLP
  • Understand the distributional hypothesis ("words known by company they keep")
  • Implement Skip-gram and CBOW conceptually
  • Perform word arithmetic (king - man + woman = queen)
  • Visualize and interpret word embedding spaces

Key Topics

Word2Vec
Skip-gram
Negative sampling
Word analogies

Key Concepts

One-hot encodingSparse vectors with single 1, rest 0s (inefficient)
Word embeddingsDense, low-dimensional word representations
Distributional hypothesisSimilar words appear in similar contexts
Skip-gramPredict context words from center word
CBOWPredict center word from context (Continuous Bag of Words)
Cosine similarityMeasure of vector angle (semantic similarity)

Key Visualizations

Word As Vector Concept Word As Vector Concept
Semantic Space 2D Semantic Space 2D
Skipgram Training Steps Skipgram Training Steps
Word2Vec Architectures Word2Vec Architectures

Resources

Moodle Resources (HS25)