Lecture 9: Harmonic Analysis — Mathematics for AI

The discovery that music can be described by numbers is one of humanity’s oldest mathematical insights. From Pythagoras hearing harmony in hammers to Fourier decomposing heat into waves, the mathematics of vibrations has an unbroken thread leading to how modern AI encodes the position of words in a sentence — using the very same sine and cosine functions that describe a vibrating guitar string.

The Timeline

Origin 550 BCE

Pythagoras of Samos

Legend says Pythagoras walked past a blacksmith and noticed that hammers of certain weight ratios produced harmonious sounds. He discovered that harmony corresponds to simple numerical ratios: octave (2:1), fifth (3:2), fourth (4:3). This was the first discovery that nature obeys mathematical relationships — and specifically that waves and vibrations have mathematical structure.

$$\text{Frequency ratios: Octave} = \frac{2}{1},\quad \text{Fifth} = \frac{3}{2},\quad \text{Fourth} = \frac{4}{3}$$

Origin

Pythagoras showed that beauty (harmony) has mathematical structure. 2,500 years later, we discovered that meaning (in language) also has mathematical structure.

Breakthrough 1747

Jean le Rond d’Alembert & Daniel Bernoulli

D’Alembert derived the wave equation describing a vibrating string. Bernoulli proposed that any vibration is a sum of simple harmonic modes (sines and cosines). This was controversial — can every shape really be built from waves? The answer, proven later by Fourier, was yes. This idea — decomposing complexity into simple waves — is the foundation of signal processing and, surprisingly, transformer position encoding.

$$\frac{\partial^2 u}{\partial t^2} = c^2 \frac{\partial^2 u}{\partial x^2}$$

Breakthrough

Bernoulli’s claim that any vibration is a sum of sines was rejected by Euler and d’Alembert. Fourier proved Bernoulli right 60 years later.

Breakthrough 1807

Joseph Fourier

Fourier’s radical claim: ANY periodic function can be decomposed into a sum of sines and cosines. He developed this to study heat flow, but it became one of the most important ideas in all of mathematics and engineering. Fourier analysis lets you see any signal as a mixture of frequencies — and this frequency-domain view is how transformers encode position.

$$f(x) = \frac{a_0}{2} + \sum_{n=1}^{\infty}\left(a_n \cos\frac{2\pi nx}{T} + b_n \sin\frac{2\pi nx}{T}\right)$$

Breakthrough

Fourier’s idea is everywhere: JPEG compression, audio processing, MRI machines, telecommunications, and transformer position encoding all use Fourier decomposition.

Discovery 1928–1949

Harry Nyquist & Claude Shannon

To perfectly reconstruct a continuous signal from discrete samples, you need at least twice the highest frequency. This theorem governs all digital audio (CD quality: 44,100 samples/second for sounds up to 22,050 Hz), all digital images, and fundamentally, how AI discretizes continuous information into tokens.

$$f_s \geq 2 f_{\max} \quad \text{(Nyquist rate)}$$

Discovery

Music CDs sample at 44.1 kHz because human hearing tops out at ~20 kHz. The Nyquist theorem says 40 kHz suffices — CD quality adds a small margin.

Breakthrough 1965

James Cooley & John Tukey

The FFT algorithm computes the Discrete Fourier Transform in $O(n \log n)$ instead of $O(n^2)$ — a speedup so dramatic it’s been called one of the most important algorithms of the 20th century. Without the FFT, modern signal processing, telecommunications, and many AI applications would be computationally impractical. It’s also the inspiration behind efficient attention mechanisms.

$$X_k = \sum_{n=0}^{N-1} x_n \cdot e^{-2\pi i\, kn/N}$$

FFT reduces this from $O(N^2)$ to $O(N \log N)$.

Breakthrough

The FFT made the impossible practical. Transforming a million-point signal went from $10^{12}$ operations to $2 \times 10^{7}$ — a 50,000× speedup.

Discovery 1980s

Jean Morlet, Ingrid Daubechies & Stéphane Mallat

Wavelets improved on Fourier by providing BOTH frequency AND time information simultaneously. A Fourier transform tells you which frequencies are present, but not when. Wavelets are localized waves that capture transient features. In AI, multi-scale processing (like the hierarchical features learned by CNNs and the multi-head attention of transformers) follows the wavelet philosophy: analyze at multiple resolutions.

$$W_\psi f(a,b) = \frac{1}{\sqrt{|a|}} \int f(t)\, \psi^*\!\left(\frac{t-b}{a}\right)dt$$

Discovery

Multi-head attention in transformers operates at multiple “scales” — each head can attend to different ranges of context, just like wavelets analyze at multiple resolutions.

AI Connection 2017

Vaswani et al.

Transformers process words in parallel — they need positional information injected. The original solution: add sine and cosine waves at geometrically increasing frequencies. Position 1 gets one pattern; position 2 gets a different pattern. Like a Fourier basis, each position has a unique “fingerprint.” The brilliance: the model can learn to compute relative positions because $\sin(a+b)$ can be expressed in terms of $\sin(a)$ and $\sin(b)$.

$$PE_{(pos,2i)} = \sin\!\left(\frac{pos}{10000^{2i/d}}\right),\quad PE_{(pos,2i+1)} = \cos\!\left(\frac{pos}{10000^{2i/d}}\right)$$

AI Connection

Position encoding IS a Fourier basis. Each dimension is a sine/cosine wave at a different frequency. The position of a word is encoded as a point in this frequency space — Fourier analysis applied to language.

AI Connection 2020–2024

Various Researchers

Standard attention is $O(n^2)$ — quadratic in sequence length. Spectral methods use the FFT to approximate attention in $O(n \log n)$, just as Cooley-Tukey accelerated Fourier transforms. FNet (Google, 2021) replaced attention entirely with Fourier transforms, achieving 92% of transformer quality at 7× the speed. Hyena (2023) uses long convolutions in the frequency domain. The future of efficient AI may be Fourier-based.

$$y = \text{FFT}_{\text{seq}}\!\bigl(\text{FFT}_{\text{hidden}}(x)\bigr)$$

FNet layer: two FFTs replace self-attention.

AI Connection

FNet showed that replacing attention with simple Fourier transforms loses only 8% accuracy but runs 7× faster. Fourier’s 200-year-old idea may be the key to efficient AI.

The Thread That Connects

From Pythagoras hearing harmony in hammers to FNet using Fourier transforms as attention, the mathematics of waves has always been about decomposing complexity into simple, understandable components. Position encoding, efficient attention, multi-scale processing — all are descendants of the insight that any signal can be built from waves.

The Harmonic Analysis Chain

$$\text{Pythagoras} \to \text{Fourier} \to \text{FFT} \to \text{Wavelets} \to \text{Position Encoding} \to \text{Spectral Attention}$$

2,500 years of wave mathematics, now encoding meaning in every transformer.

Connections to Other Lectures

Lecture 6: Number Theory & Encoding — Positional encoding and RoPE connect number-theoretic structure with harmonic functions.
Lecture 2: Linear Algebra & Transformations — The FFT is a factorization of the DFT matrix; Fourier analysis is linear algebra in disguise.
Lecture 3: Calculus & Optimization — The wave equation and PDEs that gave rise to Fourier analysis are calculus at its deepest.

Graph Theory & Networks All Lectures Game Theory & Alignment