Can a machine think? This question, which seems so modern, has roots stretching back 2,400 years. The path from Aristotle’s logical syllogisms to today’s language models is one of humanity’s greatest intellectual adventures — and at every step, mathematics was the bridge between thought and mechanism.
The Timeline
Aristotle
Aristotle created the first formal system of logic — the syllogism. “All men are mortal. Socrates is a man. Therefore, Socrates is mortal.” He showed that valid reasoning follows patterns, independent of content. This was the first hint that thinking could be reduced to rules.
Aristotle’s logic dominated Western thought for 2,000 years. It was the first attempt to mechanize reasoning — to reduce thought to symbols and rules.
Gottfried Leibniz & George Boole
Leibniz dreamed of a “calculus of reasoning” — a universal language where disputes could be settled by calculation: “Let us calculate!” Two centuries later, Boole made this real with Boolean algebra, showing that all logic could be reduced to AND, OR, NOT operations on 0s and 1s. This became the foundation of every computer.
Every CPU on Earth runs on Boolean logic. Every neural network computation is ultimately performed by Boolean gates — the legacy of Boole.
Kurt Gödel
Gödel shattered the dream of a complete, consistent mathematical system. His First Incompleteness Theorem: any sufficiently powerful formal system contains true statements it cannot prove. His Second: no such system can prove its own consistency. These theorems revealed fundamental limits of formal reasoning — limits that AI also faces.
The Gödel sentence $G$ says: “$G$ is not provable in this system.” If $G$ is provable, the system is inconsistent. If $G$ is not provable, the system is incomplete.
Can an AI ever fully understand mathematics? Gödel showed that even perfect formal systems have blind spots. This remains one of the deepest unsolved questions in AI.
Alan Turing
Turing imagined the simplest possible computing device: a tape, a head, and a table of rules. He proved that this simple machine could compute anything computable — and that some things (like determining whether a program will halt) are fundamentally undecidable. The Turing Machine became the theoretical foundation of all computing.
Turing also proposed the Turing Test (1950): can a machine fool a human into thinking it’s human? LLMs are now routinely passing informal versions of this test.
Noam Chomsky
Chomsky classified languages into a hierarchy of complexity: regular → context-free → context-sensitive → recursively enumerable. Each level requires more computational power to parse. Natural language was believed to be context-sensitive. This hierarchy shaped 50 years of Natural Language Processing — until neural networks bypassed it entirely.
Chomsky’s hierarchy says natural language needs sophisticated grammars. Transformers seem to learn these grammars implicitly from data — without being told the rules.
Frank Rosenblatt & Marvin Minsky
Rosenblatt’s Perceptron (1958) was the first neural network that could learn from data. Minsky and Papert (1969) proved it couldn’t learn the simple XOR function — because a single layer can only learn linearly separable patterns. This triggered the first “AI Winter.” The solution — multi-layer networks — took 17 years to become practical.
The XOR problem killed neural network research for over a decade. The solution was so simple: add one more layer. But proving it worked required backpropagation.
Vaswani et al. (Google Brain)
“Attention Is All You Need” replaced all previous NLP architectures with a single elegant mechanism: self-attention. Unlike recurrent networks that process words sequentially, transformers process all words in parallel, learning which words relate to which. This is not rule-based logic — it’s learned, statistical, emergent reasoning.
The transformer doesn’t use Chomsky’s grammar rules. Instead, it learns its own implicit grammar from data. Whether this counts as “true understanding” is philosophy’s newest question.
Chain-of-Thought, OpenAI o1, Claude
Chain-of-thought prompting (Wei et al., 2022) showed that LLMs reason better when they “think step by step.” OpenAI’s o1 model (2024) takes this further with explicit reasoning chains. Are LLMs truly reasoning, or merely pattern matching? This is Leibniz’s dream and Gödel’s limit colliding in real-time — and the answer will reshape our understanding of intelligence.
When you ask Claude to “think step by step,” you’re invoking a technique rooted in 2,400 years of logical formalization — from Aristotle’s syllogisms to modern chain-of-thought reasoning.
The Thread That Connects
The journey from syllogisms to language models reveals a paradox: we spent 2,400 years trying to reduce thought to formal rules, and then built thinking machines that learned to reason without being given the rules. The dream of mechanized thought was achieved — but not in the way anyone expected.
Connections to Other Lectures
- Lecture 1: Probability & Uncertainty — The statistical nature of LLM reasoning: transformers don’t use formal logic, they use probability distributions over tokens.
- Lecture 6: Number Theory & Encoding — How encoding and representation turn language into numbers that machines can process — the bridge from Turing’s tape to modern tokenization.
- Lecture 10: Game Theory & Alignment — How AI alignment builds on the tension between formal rules and emergent behavior that Gödel and Turing first revealed.