Lesson 7.8: Introduction to Convolutional Neural Networks (CNNs)

Before we dive into sequences, we take a brief but important detour to introduce the specialized architecture that revolutionized computer vision. CNNs are designed to find patterns in spatial data, and their core ideas of 'feature detectors' and 'parameter sharing' have profound implications for other fields, including financial time series analysis.

Part 1: The Problem with MLPs for Images

Imagine you want to build a neural network to recognize handwritten digits from a small 28x28 pixel grayscale image. If we use our standard Multi-Layer Perceptron (MLP) from the last lesson, we would have to "flatten" the image into a single, long vector of 784 pixels (28×28=78428 \times 28 = 784).

This approach has two catastrophic problems:

  1. It Destroys Spatial Structure: The MLP has no idea that pixel (row 5, col 5) is "next to" pixel (row 5, col 6). It treats every pixel as an independent feature. All the crucial spatial information—the edges, corners, and shapes—is completely lost.
  2. It's Computationally Massive: If the first hidden layer has just 100 neurons, the number of weights required for this first connection alone is 784×100=78,400784 \times 100 = 78,400. For a modern color image, this number would be in the billions. It's completely impractical.

We need a new architecture that is designed to "see" patterns in a grid and to do so efficiently.

Part 2: The CNN's Solution - Local Feature Detectors

The Core Analogy: A 'Feature Flashlight'

A CNN doesn't look at the whole image at once. It uses a small "flashlight," called a **kernel** or **filter**, and slides it across the entire image.

This "flashlight" is not for seeing; it's for **detecting**. It's a tiny pattern-matching machine. For example:

  • One kernel might be a "vertical edge detector." As it slides across the image, it "lights up" (produces a high value) whenever it finds a vertical edge.
  • Another kernel might be a "horizontal edge detector."
  • A third might be a "45-degree line detector."

The crucial insight is that these kernels are **learned**. The network uses backpropagation to learn the best possible set of "flashlights" to find the most useful patterns (edges, corners, curves) in the data.

The Two Superpowers of CNNs

  1. Parameter Sharing: This is the solution to the "massive computation" problem. The *same* small kernel (e.g., a 3x3 grid of 9 weights) is used across the entire image. Instead of learning a separate weight for every pixel, the network learns just one "vertical edge detector" and then *reuses* it everywhere. This is incredibly efficient.
  2. Spatial Hierarchy: A CNN is built in layers.
    • The first layer learns to detect simple patterns (edges, corners).
    • The next layer takes the "maps" of where the edges are and learns to combine them into more complex shapes (eyes, noses, wheels).
    • The next layer combines eyes and noses into faces.
    This hierarchical learning of features is what makes CNNs so powerful.

Part 3: The Building Blocks of a CNN

A typical CNN architecture is a sequence of three main types of layers.

1. The Convolutional Layer

This is the core layer where the "feature flashlights" (kernels) are applied. The output is a set of "feature maps" that show where in the image each feature was detected.

2. The Activation Layer (ReLU)

Just like in our MLP, a non-linear activation function like ReLU is applied after the convolution to introduce non-linearity, allowing the network to learn more complex patterns.

3. The Pooling Layer

The goal of pooling is to downsample the feature maps, making the representation smaller and more manageable. A common method is **Max Pooling**. It takes a small window (e.g., 2x2) of the feature map and keeps only the maximum value. This makes the representation more robust to small translations of the feature in the image.

A full CNN will typically stack these three blocks multiple times (CONV → RELU → POOL) before finally flattening the output and feeding it into a standard MLP for the final classification.

Part 4: Applications in Finance

While CNNs were born in computer vision, their ability to find local patterns in grid-like data has found powerful applications in finance.

Financial Applications of CNNs
    • Analyzing Time Series as "Images": Quants can take a multivariate time series (e.g., the last 50 days of Open, High, Low, Close prices plus Volume) and represent it as a 50x5 "image." A 1D or 2D CNN can then be applied to this "image" to learn predictive patterns (like "head and shoulders" or "flags") automatically, without having to manually code them. This is a powerful form of automated technical analysis.
    • Option "Volatility Surfaces": The implied volatilities of options across different strike prices and expiration dates form a 2D grid or "surface." A CNN can be trained to recognize the shape of this surface to predict future volatility movements.

What's Next? Adding Memory to the Brain

The CNN architecture brilliantly solves the problem of spatial data. But what about sequential data, like a sentence of text or a time series of stock prices? While a CNN can find local patterns, it has no inherent sense of **order** or **long-term memory**.

This brings us to the main event of our next module. In **Module 8: Deep Learning for Sequences**, we will finally tackle the problem of time directly by introducing a new kind of neural network—one that has loops and memory. We will begin by formally defining the problem of sequences and exploring why our MLP and CNN architectures are insufficient.