Lesson 4.5: Capstone: A Pairs Trading Strategy
This capstone lesson synthesizes the module by building a complete quantitative pairs trading strategy from start to finish, using the concept of cointegration.
The Workflow
The Quant Workflow
- Formation Period: Use historical data to find a cointegrated pair of assets (e.g., using the Engle-Granger test) and estimate their long-run relationship ().
- Trading Period: On a separate, out-of-sample period, calculate the spread () and its Z-score.
- Signal Generation: Define entry/exit rules based on the Z-score. For example, short the spread if Z-score > 2, go long if Z-score < -2, and exit when the Z-score crosses 0.
- Backtesting: Simulate the strategy and evaluate its performance (e.g., cumulative returns, Sharpe ratio).
Python Implementation
Example: Pairs Trading BMO and BNS
import pandas as pd
import yfinance as yf
import statsmodels.api as sm
from statsmodels.tsa.stattools import coint
# Define assets and time periods
asset1, asset2 = 'BMO', 'BNS'
formation_start, formation_end = '2015-01-01', '2020-12-31'
trading_start, trading_end = '2021-01-01', '2023-12-31'
# Download and split data
df = yf.download([asset1, asset2], start=formation_start, end=trading_end)['Adj Close']
formation_df = df.loc[formation_start:formation_end]
trading_df = df.loc[trading_start:trading_end]
# Test for cointegration
p_value = coint(formation_df[asset1], formation_df[asset2])[1]
print(f"Cointegration Test p-value: {p_value:.4f}")
# Estimate hedge ratio
X = sm.add_constant(formation_df[asset2])
model = sm.OLS(formation_df[asset1], X).fit()
beta = model.params[asset2]
# Calculate spread stats from formation period
spread_formation = formation_df[asset1] - beta * formation_df[asset2]
spread_mean = spread_formation.mean()
spread_std = spread_formation.std()
# Calculate Z-score for trading period
spread_trading = trading_df[asset1] - beta * trading_df[asset2]
z_score = (spread_trading - spread_mean) / spread_std
# (Simple backtesting logic would follow...)
What's Next? Machine Learning for Time Series
You have now completed the entire module on multivariate time series analysis.
The next logical step is to see how we can apply the powerful, non-linear models from machine learning (like XGBoost and LSTMs) to time series forecasting. In the final module, we will explore the feature engineering and validation techniques required to do this effectively.