Lecture 4: Logic & Computation — Mathematics for AI

Can a machine think? This question, which seems so modern, has roots stretching back 2,400 years. The path from Aristotle’s logical syllogisms to today’s language models is one of humanity’s greatest intellectual adventures — and at every step, mathematics was the bridge between thought and mechanism.

The Timeline

Origin 350 BCE

Aristotle

Aristotle created the first formal system of logic — the syllogism. “All men are mortal. Socrates is a man. Therefore, Socrates is mortal.” He showed that valid reasoning follows patterns, independent of content. This was the first hint that thinking could be reduced to rules.

$$\forall x: \text{Man}(x) \Rightarrow \text{Mortal}(x); \quad \text{Man}(\text{Socrates}); \quad \therefore \text{Mortal}(\text{Socrates})$$

Origin

Aristotle’s logic dominated Western thought for 2,000 years. It was the first attempt to mechanize reasoning — to reduce thought to symbols and rules.

Breakthrough 1666–1854

Gottfried Leibniz & George Boole

Leibniz dreamed of a “calculus of reasoning” — a universal language where disputes could be settled by calculation: “Let us calculate!” Two centuries later, Boole made this real with Boolean algebra, showing that all logic could be reduced to AND, OR, NOT operations on 0s and 1s. This became the foundation of every computer.

$$A \land B, \quad A \lor B, \quad \lnot A$$

Breakthrough

Every CPU on Earth runs on Boolean logic. Every neural network computation is ultimately performed by Boolean gates — the legacy of Boole.

Unsolved 1931

Kurt Gödel

Gödel shattered the dream of a complete, consistent mathematical system. His First Incompleteness Theorem: any sufficiently powerful formal system contains true statements it cannot prove. His Second: no such system can prove its own consistency. These theorems revealed fundamental limits of formal reasoning — limits that AI also faces.

The Gödel sentence $G$ says: “$G$ is not provable in this system.” If $G$ is provable, the system is inconsistent. If $G$ is not provable, the system is incomplete.

Unsolved

Can an AI ever fully understand mathematics? Gödel showed that even perfect formal systems have blind spots. This remains one of the deepest unsolved questions in AI.

Breakthrough 1936

Alan Turing

Turing imagined the simplest possible computing device: a tape, a head, and a table of rules. He proved that this simple machine could compute anything computable — and that some things (like determining whether a program will halt) are fundamentally undecidable. The Turing Machine became the theoretical foundation of all computing.

$$\text{HALT}(P, I) = \begin{cases} \text{true} & \text{if } P \text{ halts on input } I \\ \text{false} & \text{otherwise} \end{cases} \text{ is undecidable.}$$

Breakthrough

Turing also proposed the Turing Test (1950): can a machine fool a human into thinking it’s human? LLMs are now routinely passing informal versions of this test.

Discovery 1956

Noam Chomsky

Chomsky classified languages into a hierarchy of complexity: regular → context-free → context-sensitive → recursively enumerable. Each level requires more computational power to parse. Natural language was believed to be context-sensitive. This hierarchy shaped 50 years of Natural Language Processing — until neural networks bypassed it entirely.

$$\text{Type 3} \subset \text{Type 2} \subset \text{Type 1} \subset \text{Type 0}$$

Discovery

Chomsky’s hierarchy says natural language needs sophisticated grammars. Transformers seem to learn these grammars implicitly from data — without being told the rules.

Unsolved 1958–1969

Frank Rosenblatt & Marvin Minsky

Rosenblatt’s Perceptron (1958) was the first neural network that could learn from data. Minsky and Papert (1969) proved it couldn’t learn the simple XOR function — because a single layer can only learn linearly separable patterns. This triggered the first “AI Winter.” The solution — multi-layer networks — took 17 years to become practical.

$$\text{XOR}(x_1, x_2) = (x_1 \lor x_2) \land \lnot(x_1 \land x_2)$$

Unsolved

The XOR problem killed neural network research for over a decade. The solution was so simple: add one more layer. But proving it worked required backpropagation.

AI Connection 2017

Vaswani et al. (Google Brain)

“Attention Is All You Need” replaced all previous NLP architectures with a single elegant mechanism: self-attention. Unlike recurrent networks that process words sequentially, transformers process all words in parallel, learning which words relate to which. This is not rule-based logic — it’s learned, statistical, emergent reasoning.

$$\text{Output} = \text{softmax}\!\left(\frac{QK^T}{\sqrt{d_k}}\right)V$$

AI Connection

The transformer doesn’t use Chomsky’s grammar rules. Instead, it learns its own implicit grammar from data. Whether this counts as “true understanding” is philosophy’s newest question.

AI Connection 2022–2024

Chain-of-Thought, OpenAI o1, Claude

Chain-of-thought prompting (Wei et al., 2022) showed that LLMs reason better when they “think step by step.” OpenAI’s o1 model (2024) takes this further with explicit reasoning chains. Are LLMs truly reasoning, or merely pattern matching? This is Leibniz’s dream and Gödel’s limit colliding in real-time — and the answer will reshape our understanding of intelligence.

$$P(\text{answer} \mid \text{question}) \ll P(\text{answer} \mid \text{question}, \text{chain-of-thought steps})$$

AI Connection

When you ask Claude to “think step by step,” you’re invoking a technique rooted in 2,400 years of logical formalization — from Aristotle’s syllogisms to modern chain-of-thought reasoning.

The Thread That Connects

The journey from syllogisms to language models reveals a paradox: we spent 2,400 years trying to reduce thought to formal rules, and then built thinking machines that learned to reason without being given the rules. The dream of mechanized thought was achieved — but not in the way anyone expected.

The Logic Chain

$$\text{Aristotle} \to \text{Boole} \to \text{G\"{o}del} \to \text{Turing} \to \text{Chomsky} \to \text{Transformer}$$

2,400 years of mechanizing thought, from syllogisms to self-attention.

Connections to Other Lectures

Lecture 1: Probability & Uncertainty — The statistical nature of LLM reasoning: transformers don’t use formal logic, they use probability distributions over tokens.
Lecture 6: Number Theory & Encoding — How encoding and representation turn language into numbers that machines can process — the bridge from Turing’s tape to modern tokenization.
Lecture 10: Game Theory & Alignment — How AI alignment builds on the tension between formal rules and emergent behavior that Gödel and Turing first revealed.