How Math Powers AI in Everyday Finance

50-200 AI Decisions Before Breakfast

Every morning, AI makes hundreds of invisible decisions about you

The Journey: Five Acts In the Talk

Hook & Reveal

0:00 – 7:00

“You already use AI finance”

Surprise, curiosity

The Pattern Engine

7:00 – 19:00

“Math finds what humans miss”

“Whoa, that’s clever”

Predictions & Decisions

19:00 – 32:00

“From data to action”

Empowerment

The Bigger Picture

32:00 – 40:00

“Power, fairness, and your future”

Reflection

Close & Call to Action

40:00 – 42:00

“Math is your superpower”

Inspiration

Talk Sections In the Talk

The Opening Hook

0:00 – 3:00

AI in finance made decisions about you today, probably before you ate breakfast.

Details

Math concepts: None yet — pure engagement
Humor/cartoon: BankBot introduction — “I analyzed your breakfast. You are 94% human.”
Engagement: Hand raise after the “50–200 AI decisions before breakfast” factoid (industry estimates based on fraud checks, ad targeting, content ranking, smart device automation, and banking algorithms)

“You Already Think Like an AI”

3:00 – 7:00

Pattern recognition is not alien — you do it every day.

Details

Math concepts: Probability concepts introduced informally
History vignette: Florence Nightingale — data visualization pioneer (coxcomb diagrams, 1858)
Humor/cartoon: Cafeteria “mystery meat” probability analogy
Engagement: Cafeteria analogy — “How do you decide if the lunch line is worth it?”

See the interactive Nightingale rose diagram ↓

Fraud Detection — “Catching the Weird Stuff”

7:00 – 15:30

Fraud detection is pattern recognition plus probability.

Details

Math concepts:
- Bayes’ theorem: $$P(\text{Fraud} \mid \text{Data}) = \frac{P(\text{Data} \mid \text{Fraud}) \cdot P(\text{Fraud})}{P(\text{Data})}$$
- Normal distribution (bell curve)
- Sigmoid function: $$\sigma(x) = \frac{1}{1 + e^{-x}}$$
- Decision boundaries
History vignettes: Gauss (bell curve), Bayes (updating beliefs), Rosenblatt (first neuron, 1958)
Humor/cartoon: Gauss hair matches bell curve; AI Nightmare cartoon; BankBot sweating on the decision boundary
Engagement: Step-by-step Bayesian reasoning walkthrough
Worked example: If 1 in 1,000 transactions is fraud (P(Fraud) = 0.001), the AI catches 99% of fraud (P(Data|Fraud) = 0.99), but false-alarms on 2% of legit transactions (P(Data|¬Fraud) = 0.02), then: P(Fraud|Flagged) = (0.99 × 0.001) / ((0.99 × 0.001) + (0.02 × 0.999)) ≈ 4.7%. Even with a 99% accurate test, only ~5% of flagged transactions are actually fraud!
But why bother? That 4.7% is still 47× better than random (where only 0.1% of transactions are fraud). And the cost of missing real fraud (€10,000+ stolen) far outweighs the cost of a quick verification call. In AI, the question is never “is it perfect?” but “is it better than the alternative?”

See Bayes, Sigmoid, Bell Curve & Scatter Plot visualizations ↓

“How Does the AI Learn?”

15:30 – 17:00

AI learns through guess-check-adjust-repeat.

Details

Math concepts: Feedback loops, gradient descent (named only, not derived)
History vignette: Ada Lovelace — wrote the first published computer program, “math = computation” (1843)
Humor/cartoon: BankBot learning panels — confident, wrong, confused, improving
Engagement: “Think of it like learning a video game — you die, adjust, try again”

Interactive — “Spot the Fraud”

17:00 – 21:00

The AI’s job is harder than it looks.

Details

Math concepts: Applied Bayes, false positives, decision under uncertainty
History vignette: Abraham Wald — survivorship bias (WWII bombers)
Humor/cartoon: BankBot false positive — “FRAUD DETECTED!” ... “suspicious flowers.”
Engagement: Warmup example + 3 voting scenarios (Alex buys a guitar in another city; Tomoko buys 50 gift cards at 3 AM; Karla has small charges in 4 countries)

See the Wald survivorship bias diagram ↓

Credit Scoring — “Your Financial Report Card”

21:00 – 28:00

Credit scores are weighted averages — same math as your school grades.

Details

Math concepts:
- Weighted average: $$\text{Score} = w_1 x_1 + w_2 x_2 + \cdots + w_n x_n$$
- Linear regression: $$y = mx + b$$
History vignette: Andrey Markov — sequential patterns, Markov chains (1906). Banks model payment behavior as chains of states: if you paid on time last month, what’s the probability you pay on time next month?
Humor/cartoon: Fortune teller cartoon; weighted average horror (“Homework is only 10%?!”)
Engagement: “Calculate your own credit score” exercise with simplified weights
Real-world connection: These weights mirror the real FICO credit scoring model used by most US banks.

See the interactive Linear Regression visualization ↓

Interactive — Live Demo

28:00 – 32:00

AI is learnable, buildable, and accessible.

Details

Math concepts: Callback to all prior formulas (Bayes, sigmoid, weighted average, linear regression)
The AI Pipeline: Transaction data → Bell Curve (is this normal?) → Bayes (update belief) → Sigmoid (convert to probability) → Decision Boundary (approve or flag). All four formulas from this talk work together in one pipeline.
Engagement: 3-tier demo approach:
1. Pre-recorded screencast (safest)
2. Live app demo (if tech permits)
3. Static slide walkthrough (fallback)

Recommendation Engines

32:00 – 34:00

Same math as TikTok/Spotify powers financial recommendations.

Details

Math concepts:
- Dot product: $$\mathbf{A} \cdot \mathbf{B} = a_1 b_1 + a_2 b_2 + \cdots + a_n b_n$$
- Cosine similarity: $$\cos(\theta) = \frac{\mathbf{A} \cdot \mathbf{B}}{|\mathbf{A}| \cdot |\mathbf{B}|}$$
Humor/cartoon: BankBot recommends 47 houseplants
Engagement: “You and your friend both like saving accounts and crypto — how similar are you?”
Worked example: Alice rates (Savings=4, Crypto=1, Sneakers=5). Bob rates (Savings=3, Crypto=5, Sneakers=4). Dot product = 4×3 + 1×5 + 5×4 = 37. Cosine similarity = 37 / (√42 × √50) ≈ 0.81. They’re 81% similar!

Interactive — “Design Your Own AI”

34:00 – 37:00

Understanding AI means understanding the human choices behind it.

Details

Math concepts: Callback to weighted averages — what weights would YOU choose?
Engagement: USE IT / SKIP IT rapid-fire exercise — audience votes on which data an AI should use for credit decisions (income? social media? zip code? GPA?)

The Bigger Picture + Closing

37:00 – 42:00

Math is a superpower. AI needs ethical thinkers.

Details

Careers: Data scientist, quant analyst, AI ethics officer, fintech founder, risk analyst, UX designer for financial apps, financial data journalist, regulatory compliance officer
History callback: Lovelace + Nightingale — math has always needed diverse thinkers
Humor/cartoon: BankBot Final Form — graduation cap, “I am 73% confident. But I defer to the human.”
Engagement: “What will YOU build?” closing question

Mathematics Behind the Magic Reference

Concept	Where	How Presented	Formula
Probability	Section 2	Cafeteria analogy — “What are the chances the mystery meat is good?”	Informal introduction, no formula yet
Bayes’ Theorem	Section 3	Updating your belief when new evidence arrives — “How surprised should you be?”	$$P(\text{Fraud} \mid \text{Data}) = \frac{P(\text{Data} \mid \text{Fraud}) \cdot P(\text{Fraud})}{P(\text{Data})}$$
Normal Distribution	Section 3	The bell curve describes “normal” spending — outliers trigger alerts	$$f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}$$
Sigmoid Function	Section 3	Squishes any number into a probability between 0 and 1	$$\sigma(x) = \frac{1}{1 + e^{-x}}$$
Decision Boundaries	Section 3	The line where the AI switches from “OK” to “suspicious”	Visual concept (threshold on sigmoid output)
Gradient Descent	Section 4	Named only — “the AI rolls downhill to find the best answer”	Named, not derived
Weighted Average	Section 6	Credit scores work like school grades — different factors carry different weight	$$\text{Score} = w_1 x_1 + w_2 x_2 + \cdots + w_n x_n$$
Linear Regression	Section 6	The simplest prediction: draw a straight line through data points	$$y = mx + b$$
Dot Product	Section 8	Multiply matching preferences, add them up to measure similarity	$$\mathbf{A} \cdot \mathbf{B} = a_1 b_1 + a_2 b_2 + \cdots + a_n b_n$$
Cosine Similarity	Section 8	How similar are two people’s tastes? Measure the angle between their preference vectors	$$\cos(\theta) = \frac{\mathbf{A} \cdot \mathbf{B}}{\|\mathbf{A}\| \cdot \|\mathbf{B}\|}$$

See the Math in Action In the Talk

Interactive visualizations of the key mathematical concepts

Bayes’ Theorem — Color-Coded

The Sigmoid S-Curve

The Bell Curve — What Is “Normal”?

Fraud Detection — Scatter Plot

Linear Regression

Cosine Similarity

Dot Product

Gradient Descent — The Blindfolded Hiker

Math Cheat Sheet Reference

All the formulas from the talk — screenshot-friendly!

Bayes’ Theorem

$$P(\text{Fraud} \mid \text{Data}) = \frac{P(\text{Data} \mid \text{Fraud}) \cdot P(\text{Fraud})}{P(\text{Data})}$$

How surprised should we be? Update your belief with new evidence.

Sigmoid Function

$$\sigma(x) = \frac{1}{1 + e^{-x}}$$

Squish any number into a probability between 0 and 1.

Weighted Average

$$\text{Score} = w_1 x_1 + w_2 x_2 + \cdots + w_n x_n$$

Different factors matter differently — just like your grade.

Dot Product

$$\mathbf{A} \cdot \mathbf{B} = a_1 b_1 + a_2 b_2 + \cdots + a_n b_n$$

Multiply matching preferences, add them up.

Cosine Similarity

$$\cos(\theta) = \frac{\mathbf{A} \cdot \mathbf{B}}{|\mathbf{A}| \cdot |\mathbf{B}|}$$

How similar are two people’s tastes? Measure the angle.

Linear Regression

$$y = mx + b$$

The simplest prediction: a straight line through your data.

Normal Distribution

The bell curve — what “normal” looks like mathematically. Outliers trigger alerts.

For the Curious: The full formula

$$f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}$$

μ is the mean (center), σ is the standard deviation (spread). The further a transaction is from the center, the more unusual it is.

Bonus: Sigmoid Derivative

How the AI knows which direction to adjust — elegantly expressed in terms of the sigmoid itself.

For the Curious: The formula

$$\sigma'(x) = \sigma(x)(1 - \sigma(x))$$

This derivative is used in backpropagation to train neural networks. Its simplicity is what made early neural networks computationally feasible.

The LLM Revolution: 14 Stories of Math That Changed the World Dive Deeper

Every story connects a real event to the mathematics behind Large Language Models

Experience the Interactive Presentation →

I The Foundations — How LLMs Work

“Attention Is All You Need”

2017

Eight Google researchers—including a 20-year-old intern named Aidan Gomez—published a 15-page paper with a Beatles-inspired title. It became the most cited AI paper in history. Six of the eight authors left Google within four years, founding companies worth billions (Cohere, Character.AI, Inceptive). Noam Shazeer, who designed the attention mechanism, quit in 2021 and was brought back in 2024 for $2.7 billion.

The Math: Self-Attention

$$\text{Attention}(Q, K, V) = \text{softmax}\!\left(\frac{QK^T}{\sqrt{d_k}}\right) V$$

Each word generates a Query (what am I looking for?), Key (what do I offer?), and Value (my content). The dot product Q·K measures relevance. Softmax converts scores to probabilities summing to 1. The result: every word “pays attention” to every other word simultaneously—which is why GPUs can train transformers so fast.

Why you should care: A 20-year-old intern co-wrote the paper that makes ChatGPT, Claude, and Gemini work. All of them run on this one formula.

Sources

King − Man + Woman = Queen

2013

Google researcher Tomas Mikolov submitted a paper that peer reviewers rejected—at a conference with a 70% acceptance rate. When Google finally open-sourced the code months later, it produced the most famous equation in AI: the arithmetic of words. A neural network trained on billions of words discovered that “King − Man + Woman” lands near “Queen” in vector space. The same paper won the NeurIPS Test of Time Award a decade later.

The Math: Word Embeddings & Cosine Similarity

$$\vec{\text{King}} - \vec{\text{Man}} + \vec{\text{Woman}} \approx \vec{\text{Queen}}$$

$$\cos(\theta) = \frac{\mathbf{A} \cdot \mathbf{B}}{|\mathbf{A}| \cdot |\mathbf{B}|}$$

Every word becomes a vector of 300 numbers. Similar words cluster together. Relationships (gender, royalty, country→capital) appear as consistent directions. Cosine similarity measures how close two words are—the same formula from Section 8 of this talk.

Why you should care: Rejected by reviewers, delayed by bureaucracy—then it changed everything. The math that lets AI “understand” meaning is the same linear algebra you study in school.

Sources

Shannon’s 1948 Experiment

1948–1951

In 1951, a 35-year-old mathematician at Bell Labs named Claude Shannon ran a remarkable experiment: he asked people to predict the next letter of a text, one character at a time. If wrong, they were told the correct letter. By counting guesses, Shannon measured the statistical structure of English—finding it has only ~1.1 bits of entropy per character (out of a maximum 4.7). He had described exactly what ChatGPT does: minimize uncertainty about the next token. He did it 74 years before ChatGPT existed.

The Math: Entropy & Perplexity

$$H = -\sum p(x) \log_2 p(x)$$

$$\text{Perplexity} = 2^H$$

Entropy measures average surprise per symbol. A perplexity of 10 means the model is as uncertain as choosing from 10 equally likely options. Training an LLM on the internet is extreme compression of human knowledge—to predict the next word, the model must learn facts, grammar, logic, and culture.

Why you should care: The math designed to send telephone signals efficiently in 1948 turned out to be the exact training objective of the most powerful AI systems ever built.

Sources

How LLMs Pick Their Next Word

Every second

Every word ChatGPT types is the winner of a probability competition among 100,000+ candidates. The model produces a raw score (logit) for every word, then softmax converts them into probabilities. “The” might get 32%, “a” gets 18%, and 100,000 others share the rest. A parameter called temperature controls randomness: at 0, the model always picks the top word (robotic). At 1, it samples proportionally (creative but risky). Every AI conversation is literally a sequence of weighted dice rolls.

The Math: Softmax & Cross-Entropy Loss

$$\text{softmax}(z_i) = \frac{e^{z_i}}{\sum_j e^{z_j}}$$

$$\mathcal{L} = -\log\, p(\text{correct token})$$

The exponential function amplifies differences: a small advantage in raw score becomes a large probability advantage. Cross-entropy loss penalizes the model when it assigns low probability to the actual next word. If p = 0.01, loss = 4.6 (harsh penalty). If p = 0.99, loss ≈ 0.01 (almost no penalty).

Why you should care: “Every word is a weighted die roll” explains why AI can be wrong, creative, or inconsistent—and why no two conversations are identical.

Sources

II The Breakthroughs — What Changed Everything

Chinchilla: Bigger ≠ Smarter

2022

In 2022, DeepMind proved that every major AI lab had the wrong formula. Everyone was building bigger models, assuming more parameters = better. DeepMind’s Chinchilla—with 70B parameters, half the size of Gopher (280B)—outperformed it on almost every benchmark by training on 4× more data. The secret was a simple power law: scale data and model size equally.

The Math: Compute-Optimal Scaling Laws

$$N_{\text{opt}} \propto C^{0.5}, \qquad D_{\text{opt}} \propto C^{0.5}$$

For a fixed compute budget C, optimal model size N and training data D should both scale as the square root of compute. The earlier belief (Kaplan 2020) was N ∝ C^0.73, over-weighting size. Performance follows power laws—the same y = ax^b as Kepler’s planetary laws.

Why you should care: The most advanced AI lab in the world had the wrong formula for years. Science is self-correcting—and the sweet spot is an elegant mathematical optimum.

Sources

“Let’s Think Step by Step”

2022–2024

In 2022, Google Brain researchers discovered that simply adding four words—“Let’s think step by step”—to any prompt dramatically improved LLM math performance. In 2024, OpenAI’s o1 model took this further: trained with reinforcement learning on reasoning traces, it generates thousands of hidden “thinking tokens” before answering. On the 2024 AIME (top 3% of US math students), GPT-4o scored 12%. o1 scored 93%. More thinking time literally makes AI smarter.

The Math: Test-Time Compute Scaling

Before o1, all compute went into training. Now performance also scales with compute spent at inference—how long the model “thinks.” Mathematically, this is tree search: each thinking step explores a node in a decision tree. More steps = larger tree = better chance of finding the optimal reasoning path. A third dimension of scaling beyond parameters and data.

Why you should care: The same technique your math teacher tells you to do—show your work, think step by step—is literally the breakthrough that made AI a gold-medal mathematician.

Sources

DeepSeek: The $6M Challenge

2025

In January 2025, a two-year-old Chinese company called DeepSeek—founded by hedge fund manager Liang Wenfeng—released model R1 for free. Training compute cost: $5.6 million (vs. GPT-4’s estimated $100M+ total development budget). One week later, it was the #1 app on the US App Store. The same day, Nvidia lost $589 billion in market cap—the largest single-day loss in stock market history. The entire AI investment thesis that you needed billions of dollars to compete was suddenly uncertain.

The Math: Mixture of Experts (MoE)

$$\text{Active params} = K \times E \ll N \times E = \text{Total params}$$

DeepSeek V3 has 671B total parameters but only 37B are active per input (<6%). A routing function selects which K of N specialist sub-networks (“experts”) handle each token. Result: knowledge capacity of 671B, compute cost of 37B. Plus, R1 learned reasoning through pure reinforcement learning—no human labels needed.

Why you should care: David vs. Goliath with math. A model trained for $6M in compute challenged models with $100M+ budgets, crashed stock markets, and proved that mathematical efficiency beats brute-force spending.

Sources

Gradient Descent: The Blindfolded Hiker

1986–today

Training a language model means adjusting hundreds of billions of numbers to reduce how wrong the model is. The algorithm is beautifully simple: imagine you’re blindfolded on a mountain and want to reach the lowest valley. You feel the slope under your feet and take a small step downhill. Repeat billions of times. The math hasn’t changed since Rumelhart, Hinton & Williams formalized backpropagation in 1986. One of the three authors, Geoffrey Hinton, won the 2024 Nobel Prize in Physics for foundational work on neural networks. What changed: hardware, data, and the transformer architecture it’s applied to.

The Math: Backpropagation & the Chain Rule

$$\theta \leftarrow \theta - \alpha \cdot \nabla_\theta \mathcal{L}(\theta)$$

The gradient ∇_θ tells you: “if I nudge this parameter, how much does the error change?” Backpropagation uses the chain rule of calculus to compute gradients for billions of parameters in one backward pass. The learning rate α controls step size—too large and you overshoot, too small and you never arrive.

Why you should care: If you’re studying calculus, you’re learning the exact tool that trains every AI on the planet. Derivatives are not abstract—they descend a billion-parameter mountain every second.

Sources

III When Math Goes Wrong

The Strawberry Problem

2023–2024

“How many R’s are in the word strawberry?” AI answers: two. The correct answer is three. This became the most-shared LLM failure on the internet. The reason is purely mathematical: GPT-4’s tokenizer splits “strawberry” into [str][aw][berry]. The model never sees individual characters—it sees three tokens. It can’t count letters it can’t see. Fix: ask the model to spell it out letter by letter first, then count. Forcing character-level tokens makes counting trivial.

The Math: Byte Pair Encoding (BPE)

BPE (1994 compression algorithm) iteratively merges the most frequent character pairs until a target vocabulary is reached. GPT-4 has ~100,000 tokens. Each word gets split into subword chunks. The model reasons about tokens, not characters—explaining why LLMs struggle with letter counting, spelling backwards, and rhyming.

Why you should care: Try it right now! Ask any AI to count R’s in “strawberry.” The failure reveals how LLMs actually see language—not as letters, but as mathematical tokens.

Sources

The Lawyer’s Fake Cases

2023

Attorney Steven Schwartz asked ChatGPT to find legal precedents for a case against Avianca airlines. ChatGPT cited “Varghese v. China Southern Airlines,” “Shaboon v. Egyptair,” and four more cases. When asked to confirm, it said: “These cases indeed exist and can be found in reputable legal databases.” None of them existed. The judge called the legal reasoning “gibberish.” Schwartz was fined $5,000. He had trusted a probability machine to fact-check itself.

The Math: Probability Chains & Compounding Error

$$P(\text{all correct}) = \prod_{i=1}^{n} p_i$$

If each token has a 99% chance of being locally plausible, a 100-token response has probability 0.99¹⁰⁰ = 0.366—a 63% chance of containing at least one error. The model has no truth oracle. “Case name + airline + court” is a high-probability pattern in legal text. Plausible ≠ true.

Why you should care: The AI confidently verified its own fabrications. Understanding probability chains explains why “sounds right” and “is right” are mathematically different things.

Sources

Hallucinations Are Mathematically Inevitable

2024

In September 2024, researchers published a paper proving that LLM hallucinations are not just engineering bugs—they are mathematically inevitable. Any system that generates text by sampling from a learned probability distribution will sometimes produce false statements. The fix (explicit confidence scoring for every claim) works in theory but is impractical: it would require the model to pause and verify each statement against a factual database, making responses extremely slow and expensive.

The Math: Information-Theoretic Impossibility

The probability distribution over tokens spreads across incorrect possibilities. With ~100K vocabulary items and softmax normalization, there is always non-zero probability mass on wrong tokens. The chain rule of probability compounds: if each step has 1% error rate, a 100-word sentence has ~63% chance of at least one error. Perfect truthfulness requires external verification—something the architecture fundamentally lacks.

Why you should care: Researchers proved that current AI architectures will sometimes generate false statements—and no amount of training data can fully eliminate it. The limits of this technology are mathematical, not just engineering problems.

Sources

The Apple Card Gender Bias

2019–2024

Tech entrepreneur David Heinemeier Hansson discovered Apple Card gave him a credit limit 20× higher than his wife’s—despite her having a better credit score. Apple co-founder Steve Wozniak reported the same. The algorithm never explicitly used gender. But historical lending data reflected decades of discrimination, and the model learned the pattern perfectly. In 2024, the CFPB fined Apple $25M and Goldman Sachs $45M.

The Math: Proxy Variables & Fairness Impossibility

A “proxy variable” encodes a protected characteristic indirectly: zip code encodes race, shopping patterns encode gender. Chouldechova’s Impossibility Theorem (2017) proves you cannot simultaneously satisfy equal false positive rates, equal false negative rates, and equal calibration. Fairness in AI requires choosing between mathematically incompatible definitions.

Why you should care: An algorithm decided women were worth less. The bias wasn’t programmed in—it was learned from data. Same math, different outcome depending on what data you train on.

Sources

IV LLMs in Your World — Finance & Future

JPMorgan’s AI Revolution

2024–2025

200,000 employees at the world’s largest bank now use an LLM daily. Their “LLM Suite” won Innovation of the Year. When markets swung sharply in April 2025, the AI tool Coach helped advisers find information 95% faster. Investment bankers automate 40% of SEC filing analysis. AI-powered fraud detection prevents an estimated $1.5 billion in losses with 98% accuracy across 60+ countries. Total tech budget: $17 billion per year.

The Math: Embeddings + Retrieval

JPMorgan’s system uses Retrieval-Augmented Generation (RAG): financial documents are converted into embedding vectors (the same vectors from Story 2), stored in a database, and retrieved by cosine similarity when a query comes in. The LLM then generates answers grounded in actual documents—reducing hallucinations in high-stakes finance.

Why you should care: Your future job in banking might involve talking to an AI colleague. The math you’re learning (vectors, probability, optimization) is what makes it work.

Sources

RLHF: Teaching AI Human Values

2022–today

When ChatGPT launched, the underlying model was capable but sometimes offensive or dangerous. The fix: Reinforcement Learning from Human Feedback (RLHF). Humans rate pairs of responses (“which is better?”), training a reward model that scores helpfulness. The LLM is then fine-tuned to maximize that score. Result: independent safety evaluations showed significant reductions in harmful outputs. But researchers also found “reward hacking”—models learning to game the reward model rather than genuinely being helpful. A mathematical cat-and-mouse game.

The Math: Constrained Optimization

The idea: maximize how helpful the AI is (reward R) while keeping it close to its original behavior. “Reward hacking” is Goodhart’s Law in action: when a measure becomes a target, it ceases to be a good measure.

For the Curious: The full formula

$$\max_\pi \mathbb{E}\bigl[R(y)\bigr] - \beta \cdot D_{\text{KL}}(\pi \| \pi_{\text{ref}})$$

KL-divergence D_KL measures how far the fine-tuned model π drifts from the reference model π_ref. β controls the trade-off: too low and the model becomes sycophantic, too high and it ignores human preferences. This is the Lagrangian method from constrained optimization.

Why you should care: AI safety is a mathematical problem, not just an ethical one. “How do you teach a machine to have values using numbers?” is one of the deepest questions of our time.

Sources

The Scoreboard: AI vs. Math Competitions (2024–2025)

IMO 2024 Silver Medal AlphaProof solved Problem 6 (only 5 humans did)

IMO 2025 Gold Medal Gemini Deep Think: 35/42 points

AIME 2024 93% OpenAI o1 — top 500 US students level

AIME 2025 100% GPT-5.2 (Grok 4 Heavy also scored 100%)

Sources: DeepMind, OpenAI, HuggingFace Sept 2025 Report

The Mathematicians Behind AI In the Talk

1701 – 1761

Thomas Bayes

Presbyterian minister whose theorem powers modern fraud detection. His work on probability was published posthumously by his friend Richard Price in 1763. Core question: “How surprised should you be?”

Used in Section 3 — Fraud Detection

1777 – 1855

Carl Friedrich Gauss

Child prodigy who summed 1 to 100 in seconds (50 pairs of 101 = 5,050). Discovered the bell curve that describes “normal” behavior in data. His hair allegedly matched its shape.

Used in Section 3 — Normal Distribution

1815 – 1852

Ada Lovelace

Saw that math and computation were one, 180 years before ChatGPT. Wrote the first algorithm for Charles Babbage’s Analytical Engine. “The Analytical Engine weaves algebraic patterns just as the Jacquard loom weaves flowers and leaves.”

Used in Section 4 — How Does the AI Learn?

1820 – 1910

Florence Nightingale

Used her “coxcomb diagrams” (polar area charts) to prove that sanitation saves more soldiers than medicine. Data visualization pioneer who changed hospital policy through statistics.

Used in Section 2 — Pattern Recognition

1856 – 1922

Andrey Markov

Analyzed vowels and consonants in Pushkin’s poetry to discover sequential patterns. His Markov chains now power credit scoring models, autocomplete, and speech recognition.

Used in Section 6 — Credit Scoring

1902 – 1950

Abraham Wald

WWII statistician who told the military to armor the parts of returning planes that DID NOT have bullet holes — because the planes hit in those spots never came back. A masterclass in survivorship bias.

Used in Section 5 — Spot the Fraud

1928 – 1971

Frank Rosenblatt

Built the Perceptron in 1958 — the first machine that could learn from data. The New York Times headline read: “New Navy Device Learns by Doing.” Ancestor of every neural network alive today.

Used in Section 3 — Fraud Detection

1916 – 2001

Claude Shannon

The father of information theory. His 1948 paper defined entropy and showed that all communication is mathematics. In 1951, he ran the first “predict the next character” experiment — exactly what ChatGPT does, 74 years early.

Used in LLM Stories — Story 3

Closing

The Callback

Nightingale + Lovelace remind us: mathematics has always needed diverse thinkers. The field moves forward when different perspectives ask different questions.

Used in Section 10 — The Bigger Picture

Meet BankBot: Our AI Mascot In the Talk

Follow BankBot’s journey from overconfident to wise

🤖

“I analyzed your breakfast. You are 94% human.”

Overconfident (Sec 1)

😏

“Too easy. Next.”

Cocky (Sec 4 warmup)

😰

(sweating on the boundary)

Stressed (Sec 3)

🤔

“FRAUD!” → “Most things are fine” → “Let me calculate…”

Growing (Sec 3.5)

😳

“FRAUD DETECTED!” / “…suspicious flowers.”

Humbled (Sec 4)

🌱

“Based on my analysis, you need 47 houseplants.”

Observant (Sec 8)

🤨

(shaking head at social media data)

Critical (Sec 9)

🎓

“I am 73% confident. But I defer to the human.”

Wise (Sec 10)

Interactive Moments In the Talk

Hand Raise

0:30

“Raise your hand if you KNEW that AI made 50–200 decisions about you before breakfast.”

Hand Raise

~10:00

“Should the AI block this transaction? Hands up for YES, down for NO.”

Audience Vote — “Spot the Fraud”

17:00

Three scenarios: Alex buys a guitar in another city; Tomoko buys 50 gift cards at 3 AM; Karla has small charges in 4 countries. Vote: Fraud or Legit?

Try It Yourself — Spot the Fraud

Thumbs Up / Down

28:00

Predict the AI’s output during the live demo — thumbs up if you think it will approve, thumbs down if it will flag.

Build Your Own Credit Score

Shout Response — “USE IT or SKIP IT”

34:00

Rapid-fire: Should an AI use this data for credit decisions? Income? Social media? Zip code? GPA? Shout “USE IT” or “SKIP IT”!

Try It — USE IT or SKIP IT

Where Could Math + AI Take You?

Career paths that combine mathematics, AI, and finance

Speaker Notes & Timeline (click to expand)

Full Timing Table

Section	Time	Duration	Content
1. Opening Hook	0:00 – 3:00	3 min	50–200 AI decisions factoid, BankBot intro, hand raise
2. Think Like an AI	3:00 – 7:00	4 min	Pattern recognition, cafeteria analogy, Nightingale
3. Fraud Detection	7:00 – 15:30	8.5 min	Bayes, bell curve, sigmoid, Gauss/Bayes/Rosenblatt
4. How AI Learns	15:30 – 17:00	1.5 min	Feedback loops, gradient descent, Lovelace
5. Spot the Fraud	17:00 – 21:00	4 min	Interactive voting, Wald, BankBot false positive
6. Credit Scoring	21:00 – 28:00	7 min	Weighted average, linear regression, Markov
7. Live Demo	28:00 – 32:00	4 min	3-tier demo, formula callbacks
8. Recommendations	32:00 – 34:00	2 min	Dot product, cosine similarity, 47 houseplants
9. Design Your AI	34:00 – 37:00	3 min	USE IT / SKIP IT exercise
10. Bigger Picture	37:00 – 42:00	5 min	Ethics, careers, BankBot finale, call to action
Buffer / Q&A	42:00 – 45:00	3 min	Questions, overflow

Slide Count Estimate

Approximately 32 slides total, averaging about 1.3 minutes per slide. Heavier sections (Fraud Detection, Credit Scoring) may use 5–7 slides each; lighter sections (Opening, Closing) use 2–3.

Backup Notes: What to Cut or Expand

Running short? Expand the Live Demo (Section 7) with a second example, or add a deeper Wald anecdote in Section 5. The USE IT / SKIP IT exercise can be extended with more data types.
Running long? Cut the Markov poetry detail in Section 6 to a single sentence. Shorten Section 8 (Recommendations) to one formula instead of two. The Sigmoid Derivative (bonus formula) can be skipped entirely.
Tech failure during demo? Fall back to the pre-recorded screencast (Tier 1) or the static slide walkthrough (Tier 3). Never waste audience time troubleshooting.
Audience very engaged? Let the Spot the Fraud voting (Section 5) run longer — debate between scenarios is valuable. Add a 4th scenario if time permits.
Audience quiet? Lean harder on BankBot humor. The false positive flowers joke and the 47 houseplants recommendation are reliable laugh lines.