Lesson 7.1: The Neuron - From Perceptron to ReLU

Welcome to Module 7. We now leave the world of classical statistics and enter the realm of Deep Learning. To understand a neural network, we must first understand its fundamental building block: the single artificial neuron. This lesson traces its evolution from the simple Perceptron to the modern, powerful non-linear activation functions that make deep learning possible.

Part 1: The Biological Inspiration

An artificial neuron is a loose mathematical model of a biological neuron. A biological neuron receives electrical signals from other neurons through its dendrites. It sums these incoming signals, and if the total signal exceeds a certain threshold, the neuron "fires," sending its own signal down its axon to other neurons. It's a simple on/off switch.

Part 2: The Artificial Neuron - A Two-Step Process

An artificial neuron does the exact same thing in two steps.

The Neuron's Calculation

Step 1: The Linear Step (Weighted Sum)

The neuron receives a set of inputs, $x_1, x_2, \dots, x_n$ . Each input has an associated **weight**, $w_1, w_2, \dots, w_n$ . The neuron calculates a weighted sum of its inputs and adds a **bias**, $b$ .

z = (w_1 x_1 + w_2 x_2 + \dots + w_n x_n) + b = \mathbf{w}^T \mathbf{x} + b

This is just the equation for **Linear Regression**! The "learning" process is finding the best weights $w_i$ and bias $b$ .

Step 2: The Non-Linear Step (Activation Function)

The neuron then passes this weighted sum $z$ through a non-linear **activation function**, $g(z)$ , to produce its final output, $a$ .

a = g(z) = g(\mathbf{w}^T \mathbf{x} + b)

This activation function is the "firing" mechanism. Without it, a stack of neurons would just be a series of linear combinations, which would be equivalent to a single, simple linear model.

Part 3: The Evolution of Activation Functions

The choice of activation function is critical. The history of deep learning is, in many ways, the history of finding better activation functions.

The Perceptron (1957): Used a simple step function. If $z > 0$ , output 1. Otherwise, output 0. It's a hard "on/off" switch. The problem? Its derivative is zero almost everywhere, making it impossible to train with Gradient Descent.
Sigmoid and Tanh: For many years, functions like the Sigmoid ( $\sigma(z) = 1/(1+e^{-z})$ ) and hyperbolic tangent (tanh) were dominant. They are smooth and "S-shaped," which fixed the derivative problem. However, they suffer from the **vanishing gradient problem**: for large positive or negative inputs, their slope becomes nearly flat (close to zero), which effectively "stalls" the learning process for deep networks.
The Modern Champion: ReLU (Rectified Linear Unit): This is the default activation function used in almost all modern neural networks. Its definition is shockingly simple:
$\text{ReLU}(z) = \max(0, z)$
If the input $z$ is positive, it passes it through unchanged. If the input is negative, it outputs zero. This simple function solves the vanishing gradient problem for positive inputs and is computationally very cheap.

What's Next? Building a Brain

A single neuron is just a slightly modified linear or logistic regression model. It's not very powerful on its own.

The true power of deep learning comes from stacking these simple computational units together in layers to form a **Multi-Layer Perceptron (MLP)**—the classic "neural network." In the next lesson, we will see how arranging neurons into layers allows the network to learn increasingly complex and abstract features from the data.

Advanced Concept: Fractional Differentiation for Preserving Memory

Building a Brain: The Multi-Layer Perceptron (MLP) and the Concept of Layers