Lecture 2: Perceptron Fundamentals

Duration: ~45 minutes

Slides: 32

Prerequisites: Lecture 1

Learning Objectives

After completing this lecture, you should be able to:

Define the mathematical structure of a perceptron
Explain the role of weights, bias, and activation functions
Visualize decision boundaries in 2D
Implement the perceptron learning algorithm
Identify the XOR problem and understand why it matters
Apply perceptrons to simple classification tasks

Key Concepts

1. The Perceptron Architecture

A perceptron is the simplest possible neural network: a single artificial neuron.

Click chart to view Python source code

Components:

Component	Symbol	Description
Inputs	x_1, x_2, …, x_n	Feature values
Weights	w_1, w_2, …, w_n	Learned parameters
Bias	b	Threshold adjustment
Net Input	z	Weighted sum + bias
Activation	f(z)	Decision function
Output	y	Final prediction

2. Mathematical Formulation

Step 1: Compute the Net Input (z)

The perceptron first computes a weighted sum of inputs:

z = w_1*x_1 + w_2*x_2 + ... + w_n*x_n + b

In vector notation:

z = w^T * x + b

Where:

w = weight vector [w_1, w_2, …, w_n]
x = input vector [x_1, x_2, …, x_n]
b = bias (scalar)

Step 2: Apply the Activation Function

The step function converts z to a binary output:

y = f(z) = { 1 if z >= 0
           { 0 if z < 0

Alternative threshold notation:

y = { 1 if z >= threshold
    { 0 if z < threshold

The bias b effectively shifts the threshold: z >= 0 is equivalent to w^T*x >= -b.

3. Weights: The Importance of Features

Weights determine how much each input influences the output.

Weight Value	Interpretation
Large positive	Feature strongly supports class 1
Large negative	Feature strongly supports class 0
Near zero	Feature has little influence

Finance Example: Stock Screener

Feature	Weight	Interpretation
P/E Ratio	-0.5	Higher P/E slightly decreases buy signal
Momentum	+2.0	Strong positive momentum strongly increases buy signal
Volume	+0.3	Higher volume slightly supports buying
Debt/Equity	-1.5	High debt moderately decreases buy signal

The perceptron learns these weights from training data!

4. The Bias Term

The bias b controls the decision threshold independently of input values.

Intuition: The bias determines how “easily” the neuron fires.

Positive bias: Lower bar to fire (more likely to output 1)
Negative bias: Higher bar to fire (more likely to output 0)

Geometric interpretation: The bias shifts the decision boundary away from the origin.

5. Decision Boundaries

A perceptron creates a linear decision boundary - a hyperplane that separates two classes.

In 2D (two features), the decision boundary is a line defined by:

w_1*x_1 + w_2*x_2 + b = 0

Solving for x_2:

x_2 = -(w_1/w_2)*x_1 - (b/w_2)

This is a line with:

Slope = -w_1/w_2
Intercept = -b/w_2

Click chart to view Python source code

Key Insight: Everything on one side of the line is classified as 1, everything on the other side as 0.

6. The Perceptron Learning Algorithm

The perceptron learns by adjusting weights when it makes mistakes.

Algorithm:

1. Initialize weights w and bias b (often to zeros or small random values)
2. For each training example (x, y_true):
   a. Compute prediction: y_pred = f(w^T * x + b)
   b. Compute error: error = y_true - y_pred
   c. Update weights: w = w + learning_rate * error * x
   d. Update bias: b = b + learning_rate * error
3. Repeat until convergence or max iterations

Update Rule Explained:

y_true	y_pred	error	Action
1	1	0	No update (correct)
0	0	0	No update (correct)
1	0	+1	Increase weights toward x
0	1	-1	Decrease weights away from x

Learning Rate (eta): Controls the step size of updates

Too large: Overshoots, unstable learning
Too small: Very slow convergence
Typical values: 0.01 to 0.1

7. Perceptron Convergence Theorem

Theorem: If the training data is linearly separable, the perceptron learning algorithm will converge to a solution in a finite number of steps.

Implications:

Guaranteed to find a separating hyperplane (if one exists)
No guarantee on number of iterations
No guarantee of finding the “best” hyperplane

Limitation: If data is NOT linearly separable, the algorithm will never converge.

8. The XOR Problem

The XOR (exclusive or) function demonstrates the fundamental limitation of single perceptrons.

XOR Truth Table:

x_1	x_2	XOR(x_1, x_2)
0	0	0
0	1	1
1	0	1
1	1	0

Click chart to view Python source code

Why can’t a perceptron solve XOR?

Plot the points:

(0,0) -> 0 (class 0)
(0,1) -> 1 (class 1)
(1,0) -> 1 (class 1)
(1,1) -> 0 (class 0)

No single straight line can separate the 0s from the 1s!

Solution preview: Multi-layer perceptrons (MLPs) can solve XOR by combining multiple perceptrons. This is covered in Lecture 3.

9. Linear Separability

Definition: A dataset is linearly separable if there exists a hyperplane that perfectly separates the two classes.

Examples of linearly separable problems:

AND gate
OR gate
Simple threshold decisions

Examples of non-linearly separable problems:

XOR gate
Circular class boundaries
Many real-world classification problems

Finance Connection: Many financial classification problems (e.g., “will this stock outperform?”) are NOT linearly separable, which is why we need more powerful models.

Key Formulas

Perceptron Output

y = f(sum_{i=1}^{n} w_i * x_i + b)

Step Activation Function

f(z) = { 1 if z >= 0
       { 0 otherwise

Weight Update Rule

w_i(new) = w_i(old) + eta * (y_true - y_pred) * x_i

Bias Update Rule

b(new) = b(old) + eta * (y_true - y_pred)

Decision Boundary (2D)

w_1*x_1 + w_2*x_2 + b = 0

Finance Application: Binary Stock Classifier

Problem: Given financial metrics, classify stocks as “Buy” or “Pass”

Features (inputs):

x_1 = Normalized P/E ratio
x_2 = 6-month momentum (%)
x_3 = Normalized trading volume
x_4 = Debt-to-equity ratio
x_5 = Earnings surprise (%)

Output:

y = 1: Buy
y = 0: Pass

Training data: Historical stocks with known outcomes (did they outperform the benchmark?)

Learned interpretation: After training, examine the weights:

Large positive w_2 (momentum): Momentum is a strong buy signal
Negative w_1 (P/E): High P/E stocks are less attractive
etc.

Practice Questions

Mathematical Understanding

Q1: Given weights w = [2, -1] and bias b = -1, what is the output for input x = [1, 0]?

Answer

z = w^T * x + b = (2)(1) + (-1)(0) + (-1) = 2 + 0 - 1 = 1 Since z = 1 >= 0, the output y = 1.

Q2: For the same perceptron (w = [2, -1], b = -1), what is the equation of the decision boundary?

Answer

The decision boundary is where z = 0: 2*x_1 + (-1)*x_2 + (-1) = 0 2*x_1 - x_2 - 1 = 0 x_2 = 2*x_1 - 1 This is a line with slope 2 and y-intercept -1.

Q3: A perceptron makes an error: y_true = 1 but y_pred = 0. The input is x = [3, 2] and learning rate is 0.1. If the current weights are w = [0.5, 0.5] and b = 0, what are the new weights and bias?

Answer

error = y_true - y_pred = 1 - 0 = 1 w_new = w_old + eta * error * x w_new = [0.5, 0.5] + 0.1 * 1 * [3, 2] w_new = [0.5 + 0.3, 0.5 + 0.2] w_new = [0.8, 0.7] b_new = b_old + eta * error b_new = 0 + 0.1 * 1 = 0.1

Conceptual Understanding

Q4: Why is the bias term necessary? What happens if we remove it?

Answer

Without bias, the decision boundary must pass through the origin (0, 0). This severely limits which problems can be solved. With bias, we can shift the decision boundary anywhere in the feature space, making the perceptron much more flexible.

Q5: Can a perceptron learn the AND function? What about the OR function?

Answer

Yes to both! AND and OR are linearly separable. For AND (both inputs must be 1): - Weights w = [1, 1], bias b = -1.5 - z = x_1 + x_2 - 1.5 - Only (1,1) gives z = 0.5 > 0 For OR (at least one input is 1): - Weights w = [1, 1], bias b = -0.5 - z = x_1 + x_2 - 0.5 - (1,0), (0,1), and (1,1) all give z > 0

Q6: Why is XOR important in the history of neural networks?

Answer

XOR demonstrated a fundamental limitation of single-layer perceptrons. Minsky and Papert proved mathematically that no single perceptron can solve XOR. This contributed to the first AI winter by showing that perceptrons couldn't solve many interesting problems. However, this limitation was later overcome with multi-layer networks.

Application

Q7: You’re building a perceptron for credit approval with features: income (x_1), credit score (x_2), existing loans (x_3). After training, the weights are w = [0.8, 1.2, -0.5]. Interpret these weights.

Answer

- w_1 = 0.8 (income): Higher income moderately increases approval probability - w_2 = 1.2 (credit score): Higher credit score strongly increases approval probability (most important feature) - w_3 = -0.5 (existing loans): More existing loans moderately decreases approval probability Credit score has the largest absolute weight, making it the most influential feature in the decision.

Reading List

Essential Reading

Rosenblatt (1958) - “The Perceptron: A Probabilistic Model” - The original paper
Nielsen, Chapter 1 - Neural Networks and Deep Learning (online)

Mathematical Deep Dive

Novikoff (1962) - “On Convergence Proofs on Perceptrons” - The convergence theorem proof
Goodfellow et al., Chapter 6 - Deep Learning textbook

Historical Context

Minsky & Papert (1969) - “Perceptrons” - The famous critique

Video Resources

3Blue1Brown - “Gradient descent, how neural networks learn” (YouTube)

Summary

This lecture covered:

Perceptron architecture - Inputs, weights, bias, activation, output
Mathematical formulation - z = w^T * x + b, y = f(z)
Decision boundaries - Linear hyperplanes separating classes
Learning algorithm - Adjust weights based on errors
Convergence theorem - Guaranteed convergence for linearly separable data
XOR problem - The fundamental limitation that led to multi-layer networks

Key Takeaway: A single perceptron is a powerful but limited classifier. It can only solve linearly separable problems.

Next Lecture: MLP Architecture - We’ll see how stacking perceptrons into layers overcomes the XOR limitation.

Previous: History Home Next: MLP