Lesson 8.8: Capstone - Forecasting Volatility with an LSTM

This is the final exam for Module 8. We will apply our most sophisticated sequence model—the LSTM—to the problem of forecasting stock market volatility. This project synthesizes deep learning concepts with practical financial data science, demonstrating how to prepare sequence data, build an LSTM architecture, and evaluate its forecasting performance.

Part 1: The Problem and The Goal

The Business Problem: A volatility trading desk needs a sophisticated model to forecast future realized volatility. While GARCH provides a strong baseline, and XGBoost can incorporate many features, a deep learning model like an LSTM might be able to capture more complex, non-linear dynamics and longer-term dependencies from the raw sequence of returns.

The Machine Learning Goal: We will build a **sequence-to-value regression model**. The model will take a sequence of past returns (e.g., the last 60 days) as input and predict a single value: the realized volatility over the next 21 days.

Part 2: The Deep Learning Workflow for Time Series

The Professional's Checklist

Data Preparation: Load daily price data and calculate log returns.
Target Variable Engineering: Create our target variable: the 21-day future realized volatility.
Feature Scaling: Scale the input features (the returns) using `StandardScaler`.
Sequence Creation: This is the most important new step. We must transform our flat time series into overlapping "windows" or sequences of a fixed length to feed into the LSTM.
Train-Test Split: Perform a chronological walk-forward split to prevent look-ahead bias.
Model Architecture: Build an LSTM model in Keras/TensorFlow.
Training & Evaluation: Train the model on the sequence data and evaluate its performance (RMSE) on the test set.

Part 3: The Complete Python Implementation

We will now walk through the complete, executable code for this project.

import pandas as pd
import numpy as np
import yfinance as yf
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping

# --- 1. Data Preparation & Feature Engineering ---
spy = yf.download('SPY', start='2005-01-01', end='2023-12-31')
spy['log_return'] = np.log(spy['Adj Close'] / spy['Adj Close'].shift(1))
# Target: Future 21-day realized volatility
spy['realized_vol_21d'] = spy['log_return'].rolling(window=21).std() * np.sqrt(252)
spy['target'] = spy['realized_vol_21d'].shift(-21)
spy.dropna(inplace=True)

# --- 2. Feature Scaling ---
returns = spy[['log_return']].values
scaler = StandardScaler()
returns_scaled = scaler.fit_transform(returns)

# --- 3. Sequence Creation ---
def create_sequences(data, target, sequence_length):
    X, y = [], []
    for i in range(len(data) - sequence_length):
        X.append(data[i:(i + sequence_length)])
        y.append(target[i + sequence_length])
    return np.array(X), np.array(y)

SEQ_LENGTH = 60 # Use the last 60 days of returns to predict
X, y = create_sequences(returns_scaled, spy['target'].values, SEQ_LENGTH)

# --- 4. Train-Test Split ---
train_size = int(len(X) * 0.8)
X_train, X_test = X[:train_size], X[train_size:]
y_train, y_test = y[:train_size], y[train_size:]

# --- 5. Model Architecture ---
model = Sequential([
    # Input LSTM layer
    LSTM(units=50, return_sequences=True, input_shape=(X_train.shape[1], X_train.shape[2])),
    Dropout(0.2),
    # Second LSTM layer
    LSTM(units=50, return_sequences=False),
    Dropout(0.2),
    # Dense layer
    Dense(units=25, activation='relu'),
    # Output layer: 1 neuron for the single predicted volatility value
    Dense(units=1)
])
model.compile(optimizer='adam', loss='mean_squared_error')
model.summary()

# --- 6. Training & Evaluation ---
early_stopping = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)

history = model.fit(
    X_train, y_train,
    epochs=100,
    batch_size=32,
    validation_split=0.1,
    callbacks=[early_stopping],
    verbose=1
)

# Make predictions on the test set
y_pred = model.predict(X_test).flatten()

# Evaluate
from sklearn.metrics import mean_squared_error
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
print(f"\nTest Set RMSE: {rmse:.4f}")

# Visualize predictions
test_index = spy.index[train_size + SEQ_LENGTH:]
plt.figure(figsize=(14, 7))
plt.plot(test_index, y_test, label='Actual Volatility', color='black', alpha=0.7)
plt.plot(test_index, y_pred, label=f'LSTM Forecast (RMSE={rmse:.4f})', color='red', linestyle='--')
plt.title('21-Day Volatility Forecast vs. Actual (LSTM)')
plt.legend()
plt.show()

Interpreting the Results

After running the LSTM model, a quant would compare its RMSE against simpler models like GARCH(1,1) or an XGBoost model on engineered features (from the Module 4 capstone).

Is the LSTM better? If the LSTM's RMSE is significantly lower, it suggests that there are complex, non-linear, long-range patterns in the sequence of returns that the LSTM was able to capture automatically, but which the other models missed.
Is it worth it? Even if the LSTM performs slightly better, the quant must consider the tradeoff. The LSTM is a "black box" that is computationally expensive and slow to train. The GARCH model is fast and highly interpretable. The XGBoost model is a good middle ground. The decision to deploy the LSTM depends on whether the small performance gain justifies the increased complexity and loss of interpretability.

Congratulations! You Have Completed Module 8

You have now completed the entire journey of learning from sequential data. You understand the classical statistical approach (ARIMA/GARCH), the feature-engineering machine learning approach (XGBoost), and the state-of-the-art end-to-end deep learning approach (LSTM/Transformers).

You have the complete toolkit for tackling one of the most common and important problems in quantitative finance: forecasting.

What's Next in Your Journey?

We've focused heavily on structured, numerical data. But a huge amount of financial "alpha" is hidden in unstructured text: news articles, earnings call transcripts, social media posts, and SEC filings. How do we turn words into numbers that a model can understand?

In **Module 9: The Language of Alpha: NLP in Finance**, we will dive into the world of Natural Language Processing, from classic Bag-of-Words models to modern Transformer-based embeddings like BERT.

Comparing Models: When to Use GARCH vs. XGBoost vs. LSTM for Forecasting

From Text to Numbers: Classic Techniques (Bag-of-Words, TF-IDF)