Word Embeddings

From Words to Vectors

35 SLIDES Part 1: Foundations

Words That Do Algebra: How can "king - man + woman = queen" work mathematically? The answer reveals that word embeddings capture meaning geometrically - similar words cluster together, and relationships become vector directions.

Prerequisites

Week 1: N-gram language models and probability basics
Linear algebra basics (vectors, dot products)
Understanding of neural network fundamentals helpful but not required

Overview

Transform words into dense vectors that capture semantic meaning. Word2Vec, GloVe, and vector arithmetic.

Learning Objectives

Explain why sparse one-hot vectors are problematic for NLP
Understand the distributional hypothesis ("words known by company they keep")
Implement Skip-gram and CBOW conceptually
Perform word arithmetic (king - man + woman = queen)
Visualize and interpret word embedding spaces

Key Topics

Word2Vec

Skip-gram

Negative sampling

Word analogies

Key Concepts

One-hot encodingSparse vectors with single 1, rest 0s (inefficient)

Word embeddingsDense, low-dimensional word representations

Distributional hypothesisSimilar words appear in similar contexts

Skip-gramPredict context words from center word

CBOWPredict center word from context (Continuous Bag of Words)

Cosine similarityMeasure of vector angle (semantic similarity)

Key Visualizations

Word As Vector Concept

Semantic Space 2D

Skipgram Training Steps

Word2Vec Architectures

Resources

View Slides (PDF) [source] Open in Colab Download Notebook Chart Gallery

Moodle Resources (HS25)

Lecture Slides

Slides: Embeddings Introduction Neural Networks Introduction (Pre-Reading) Neural LM Slides

Tutorials

Embeddings Tutorial