Lecture 8: Graph Theory & Networks

In 1736, Euler solved a puzzle about bridges and accidentally invented a new branch of mathematics. Nearly 300 years later, that same mathematics describes the internet, social networks, biological systems — and the architecture of every AI model. Graph theory is the mathematics of connections, and intelligence is, at its core, about making the right connections.

The Timeline

Origin 1736

Leonhard Euler

Can you cross all seven bridges of Königsberg exactly once? Euler proved it’s impossible by abstracting the city into nodes (landmasses) and edges (bridges). In doing so, he invented graph theory. The key insight: what matters is not the shape of the bridges, but the pattern of connections. This abstraction — from physical structure to connectivity — is exactly how neural network architectures are designed.

Euler’s criterion: A graph has an Eulerian path iff it has exactly 0 or 2 vertices of odd degree.

Origin

Euler proved that the shape doesn’t matter — only the connections do. This is the founding insight of graph theory AND neural network architecture design.

Discovery 1857

Arthur Cayley

Cayley’s formula: the number of labeled trees on $n$ vertices is $n^{n-2}$. Tree structures became fundamental in computer science — parse trees for language, decision trees for classification, syntax trees for compilers. Every time an NLP system parses a sentence’s grammatical structure, it’s building a tree in the sense of Cayley.

$$T_n = n^{n-2}$$

Discovery

Parse trees represent sentence structure: “The cat sat on the mat” has a tree showing subject, verb, and prepositional phrase. Before transformers, NLP was built on trees.

Breakthrough 1959

Paul Erdős & Alfréd Rényi

What happens when you randomly connect nodes? Erdős and Rényi discovered phase transitions: below a critical threshold, the graph is fragmented; above it, a giant connected component suddenly appears. This “emergence” from random connections prefigures the emergent abilities of neural networks — at some scale, new capabilities suddenly appear.

Giant component appears when edge probability $p > \frac{1}{n}$, i.e., average degree $> 1$.

Breakthrough

Phase transitions in random graphs mirror the “emergent abilities” of LLMs: at some size, models suddenly gain capabilities (reasoning, translation, coding) that smaller models completely lack.

Discovery 1998

Duncan Watts & Steven Strogatz

Watts and Strogatz showed that most real networks (social, biological, technological) are “small worlds” — highly clustered locally, but with short paths globally. Just 6 degrees separate any two people. This structure — local clustering with global shortcuts — is remarkably similar to how residual connections work in transformers: local processing with skip connections that create shortcuts.

$$L \sim \frac{\ln N}{\ln k}$$

Discovery

Skip connections in ResNets and transformers create “shortcuts” through the network — making them small-world networks where information can flow quickly from any layer to any other.

AI Connection 1998

Larry Page & Sergey Brin (Google)

PageRank models the web as a graph and computes each page’s importance by the importance of pages linking to it. This recursive definition uses eigenvectors of the link matrix — the same linear algebra as word embeddings. PageRank was the first massive-scale AI application of graph theory, and it created the most valuable company in the world.

$$PR(p) = \frac{1-d}{N} + d \sum_{q \in B_p} \frac{PR(q)}{L(q)}$$

Where $d \approx 0.85$ is the damping factor, $B_p$ is pages linking to $p$, and $L(q)$ is outgoing links from $q$.

AI Connection

Google’s PageRank is a random walk on a graph — a Markov chain. It computes the stationary distribution: “If you randomly clicked links forever, how often would you visit each page?”

AI Connection 2012–2017

Various (AlexNet, VGG, ResNet, Inception)

Every neural network IS a directed graph: nodes are neurons, edges are weighted connections. The architecture revolution (2012–2017) was about graph topology: deeper graphs (VGG), branching graphs (Inception), graphs with skip edges (ResNet). The transformer is a specific graph: a complete bipartite graph where every input node connects to every other via attention.

$$\mathbf{y} = F(\mathbf{x}) + \mathbf{x}$$

AI Connection

The transformer is a graph where every token attends to every other token — it’s a complete graph. This $O(n^2)$ connectivity is both its strength (any word can relate to any other) and its weakness (quadratic memory cost).

AI Connection 2012–2024

Google Knowledge Graph, various

Knowledge graphs represent facts as (entity, relation, entity) triples: (Einstein, born_in, Germany). Google’s Knowledge Graph (2012) contains billions of facts. Retrieval-Augmented Generation (RAG) combines LLMs with knowledge graphs: the LLM generates text while consulting a graph database for factual accuracy. This hybrid approach addresses hallucination — one of the biggest challenges in AI.

$$(h, r, t) \in \mathcal{E} \times \mathcal{R} \times \mathcal{E}, \qquad f(h, r) \approx t$$

AI Connection

RAG is the most popular technique for making LLMs factually accurate. It works by turning the LLM’s query into a graph search, then injecting the results back into the generation process.

AI Connection 2020–2024

Various (GNN community)

Graph Neural Networks generalize transformers to arbitrary graph structures. Instead of attending to all tokens (complete graph), GNNs attend only to graph neighbors. Graph Transformers combine the best of both: graph structure for efficiency, attention for expressiveness. Applications span drug discovery (molecular graphs), social network analysis, and improving LLMs themselves.

$$h_v^{(l+1)} = \text{UPDATE}\!\left(h_v^{(l)},\; \text{AGGREGATE}\!\left(\{h_u^{(l)} : u \in \mathcal{N}(v)\}\right)\right)$$

AI Connection

Drug discovery AI represents molecules as graphs (atoms = nodes, bonds = edges). GNNs predict whether a molecule will be an effective drug — graph theory saving lives.

The Thread That Connects

From bridges in Königsberg to transformer attention patterns, graph theory reveals the architecture of intelligence. Connections matter more than the things being connected. The topology of a neural network — which neurons connect to which — determines what it can learn, just as the topology of a social network determines what information flows.

The Graph Theory Chain

$$\text{Euler} \to \text{Random Graphs} \to \text{Small Worlds} \to \text{PageRank} \to \text{Transformers} \to \text{GNNs}$$

300 years of connection mathematics, now defining the architecture of AI.

Connections to Other Lectures

Lecture 2: Linear Algebra & Transformations — Adjacency matrices and eigenvalues are the linear algebra of graphs; PageRank is an eigenvector problem.
Lecture 5: Geometry of High Dimensions — Graph embeddings map nodes into high-dimensional geometric spaces where distance encodes connectivity.
Lecture 4: Logic & Computation — Computational complexity of graph problems (P vs NP) — many hard optimization problems are graph problems.