Lesson 8.1: The Problem of Memory: Why MLPs Fail on Sequence Data

Welcome to Module 8. We have built powerful neural networks (MLPs and CNNs), but they all share a fundamental flaw for many financial tasks: they have no memory. This lesson explains why a feed-forward architecture is unsuitable for sequence data and sets the stage for a new class of networks designed specifically to learn from the past.

Part 1: The 'Goldfish Brain' of an MLP

A standard Multi-Layer Perceptron (MLP) is a **feed-forward** network. Information flows in one direction only: from the input layer, through the hidden layers, to the output layer. The network processes each input completely independently of all other inputs.

Imagine you are trying to predict the next word in a sentence. You feed the words to an MLP one at a time:

The Core Analogy: The Amnesiac Reader

You show the MLP the sentence "The cat sat on the ___."

You show it the word "The". It processes it and makes a random guess. Then it completely forgets it ever saw "The".
You show it the word "cat". It processes it and makes a random guess. It has no memory of the word "The" that came before.
You show it the word "sat". It has no memory of "The cat".

An MLP has the memory of a goldfish. By the time it sees the word "on," it has no recollection of the subject of the sentence ("cat"). It has no context. It is impossible for it to predict the word "mat" because it doesn't know what is sitting.

This is the exact same problem we have with time series data. To predict tomorrow's stock price, a model *must* have access to the sequence of prices that came before it.

Part 2: The 'Feature Engineering' Workaround (and its Flaw)

How did we solve this in Module 6? We used **feature engineering**. Instead of feeding the MLP one price at a time, we transformed the problem. To predict the price at time $t$ , we created a single input vector containing:

\mathbf{x}_t = [P_{t-1}, P_{t-2}, P_{t-3}, \dots, P_{t-k}]

This is the "sliding window" approach. We manually provide the model with a fixed-size window of past observations as its features.

The Flaw in this Workaround: A Fixed, Short-Term Memory

This works, but it's a crude solution with two major limitations:

The window size is fixed. What is the right look-back window $k$ ? 10 days? 50 days? 200 days? We have to guess. If the true dependency in the data is from 100 days ago, but our window is only 50 days, the model will never see it.
Parameter sharing is non-existent. The weight the MLP learns for the feature $P_{t-1}$ is completely independent of the weight it learns for $P_{t-2}$ . It doesn't understand that these are just time-shifted versions of the same concept. It has to learn the patterns at each position in the window from scratch, which is incredibly inefficient.

Part 3: The Solution - A Brain with a 'Memory Loop'

We need a new type of network architecture—one that has a built-in mechanism for memory. We need to break the strictly feed-forward structure and introduce a **loop**.

Instead of information only flowing forward, what if the output of a layer at one time step could be fed back into itself as an *input* for the next time step?

Introducing the 'Recurrent' Connection

This is the core idea of a **Recurrent Neural Network (RNN)**. An RNN neuron doesn't just produce an output; it also maintains an internal "memory" called the **hidden state**.

Imagine a diagram: An input Xt goes into a box (the neuron). The box produces an output Yt, but also an internal state Ht. An arrow loops from the box back to itself, showing that Ht is also used as an input for the next step, along with Xt+1.

At each time step $t$ , the RNN neuron does two things:

It produces an output, $y_t$ , based on the current input $x_t$ and its *previous* hidden state, $h_{t-1}$ .
It updates its hidden state to a new state, $h_t$ , based on the current input $x_t$ and its *previous* hidden state, $h_{t-1}$ .

This "memory loop" allows information to persist and flow through time. The hidden state $h_t$ acts as a summary of all the important information the network has seen up to that point. When the RNN sees the word "on" in our sentence, its hidden state still contains the "memory" that the subject of the sentence was "cat."

What's Next? Building the RNN

We have identified the problem (MLPs have no memory) and proposed a solution (adding a recurrent loop to create a hidden state).

In the next lesson, we will dive into the mathematics of this loop. We will formally define the equations for the **Recurrent Neural Network (RNN)** and see how the concept of **backpropagation through time** allows us to train a network that has memory.

Introduction to Convolutional Neural Networks (CNNs): For Image and Grid Data

Recurrent Neural Networks (RNNs): The Concept of a Hidden State