Lesson 5.3: The Building Block of Memory: The Autoregressive (AR) Model

We are now ready to build our first forecasting model. This lesson introduces the Autoregressive (AR) model, the most intuitive time series model. It formalizes the idea of 'memory' by modeling the current value of a series as a linear combination of its own past values. This is the foundation of many sophisticated quantitative strategies.

Part 1: The Core Idea - Regressing on the Past

The name "Autoregressive" sounds complex, but the idea is beautifully simple. "Auto" means self, and "regressive" refers to regression. An autoregressive model is simply a regression of a time series on its own past values, called **lags**.

We are taking the familiar Ordinary Least Squares (OLS) framework from Module 4 and applying it in a new way. Instead of predicting $Y$ with a different variable $X$ , we will predict $Y_t$ using $Y_{t-1}, Y_{t-2},$ etc.

The Core Analogy: Driving by Looking in the Rear-View Mirror

An AR model is like driving a car where your speed today is a function of your speed a few seconds ago. You are forecasting your immediate future based on your immediate past.

An **AR(1) model** is like saying, "My speed right now ( $Y_t$ ) is some fraction ( $\phi_1$ ) of my speed one second ago ( $Y_{t-1}$ ), plus a random jolt ( $\epsilon_t$ )."
An **AR(2) model** is more sophisticated: "My speed right now is a combination of my speed one second ago and my speed two seconds ago, plus a random jolt."

The "order" of the AR model, denoted as $p$ in $AR(p)$ , tells you how many rear-view mirrors you are looking into. It specifies how many past periods are included in the regression.

Part 2: The AR(1) Model - A Deep Dive

To fully understand the mechanics, we will start with the simplest case: the AR(1) model. This is the most important single model in all of time series analysis.

The AR(1) Model Specification

The value of the series $Y$ at time $t$ is a linear function of its value at time $t-1$ , plus an error term.

Y_t = c + \phi_1 Y_{t-1} + \epsilon_t

$c$ : The intercept or constant term.
$\phi_1$ : The **autoregressive coefficient**. This is the key parameter. It measures the strength and direction of the relationship between consecutive observations.
$\epsilon_t$ : The error term at time $t$ . For a valid AR model, this must be **white noise** (mean zero, constant variance, and no autocorrelation).

The Stationarity Condition for an AR(1) Model

Not all AR(1) processes are stationary. The behavior of the model depends entirely on the value of $\phi_1$ .

The Stationarity Condition

An AR(1) process is stationary if and only if the absolute value of the autoregressive coefficient is less than 1.

|\phi_1| < 1

If $0 < \phi_1 < 1$ , shocks to the system are persistent but eventually die out. This is a common pattern in financial returns (e.g., momentum).
If $-1 < \phi_1 < 0$ , the process oscillates around its mean. A positive value is likely to be followed by a negative value. This is a sign of mean reversion.
If $\phi_1 = 1$ , the process is a **Random Walk**, which is non-stationary. The shocks are permanent and never die out. Stock prices are often modeled this way.
If $|\phi_1| > 1$ , the process is **explosive** and non-stationary. It will diverge to infinity, which is rarely seen in finance.

Part 3: The General AR(p) Model

We can easily extend the model to include $p$ lags.

The AR(p) Model Specification

Y_t = c + \phi_1 Y_{t-1} + \phi_2 Y_{t-2} + \dots + \phi_p Y_{t-p} + \epsilon_t

This can be written more compactly as:

Y_t = c + \sum_{i=1}^p \phi_i Y_{t-i} + \epsilon_t

3.1 Model Identification with ACF and PACF

How do we choose the correct order, $p$ ? We use the signature "fingerprints" from the last lesson.

The Signature of an AR(p) Process

The **ACF plot** will show a pattern of **gradual decay**. The correlation with past values will slowly taper off because the influence of $Y_{t-p}$ is passed through all the intermediate lags ( $Y_{t-p+1}, \dots, Y_{t-1}$ ).
The **PACF plot** will **cut off sharply** after lag $p$ . There will be $p$ significant spikes, and then all subsequent spikes will be inside the significance boundaries.

The PACF plot is our primary tool for identifying the order of an AR model. The lag at which the PACF cuts off is our best guess for $p$ .

Part 4: Estimation and Forecasting

4.1 Estimation via OLS

For a stationary AR(p) process, the coefficients $c, \phi_1, \dots, \phi_p$ can be consistently estimated using Ordinary Least Squares (OLS). The lagged values of Y are simply treated as the predictor variables ( $X$ 's) in the regression.

4.2 Forecasting

Once we have our estimated coefficients ( $\hat{c}, \hat{\phi}_1, \dots$ ), forecasting is straightforward. To forecast one step ahead ( $\hat{Y}_{t+1}$ ), we plug in the most recent known values of the series.

For an AR(1) model, the one-step-ahead forecast made at time $t$ is:

\hat{Y}_{t+1|t} = \hat{c} + \hat{\phi}_1 Y_t

To forecast two steps ahead, we just plug our forecast back into the equation:

\hat{Y}_{t+2|t} = \hat{c} + \hat{\phi}_1 \hat{Y}_{t+1|t}

For a stationary process where $|\phi_1|<1$ , as the forecast horizon $h$ increases, the forecast will converge to the unconditional mean of the series, $E[Y] = c / (1-\phi_1)$ . This is a key property: the "memory" fades over time, and our best long-run forecast is just the average.

Part 5: Python Implementation - Building an AR Model

AR Model in Python with statsmodels

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.ar_model import AutoReg
from statsmodels.tsa.stattools import adfuller
from statsmodels.graphics.tsaplots import plot_pacf

# --- Generate Sample AR(2) Data ---
np.random.seed(42)
from statsmodels.tsa.arima_process import ArmaProcess
ar_params = np.array([1, -0.7, -0.2]) # Note: statsmodels requires the AR params with opposite sign
ma_params = np.array([1])
ar_process = pd.Series(ArmaProcess(ar_params, ma_params).generate_sample(nsample=500), name='AR2_Process')

# --- 1. Check for Stationarity ---
# We know it's stationary by construction, but in real life, this is the first step.
# adf_result = adfuller(ar_process)
# print(f'ADF p-value: {adf_result[1]}') # Should be very low

# --- 2. Identify the Order (p) with PACF ---
fig, ax = plt.subplots(figsize=(10, 5))
plot_pacf(ar_process, ax=ax, lags=20, title='PACF for our data')
plt.show()
# We expect the PACF to cut off sharply after lag 2. This suggests p=2.

# --- 3. Estimate the AR(2) Model ---
# Split data into train and test sets
train_data = ar_process[:-50]
test_data = ar_process[-50:]

# Fit the AutoReg model
# Note: 'lags=2' specifies the order of the model.
model = AutoReg(train_data, lags=2)
model_fit = model.fit()

# Print the model summary
print(model_fit.summary())

# The coefficients for L1 and L2 should be close to our true values of 0.7 and 0.2

# --- 4. Make Forecasts ---
# Forecast the next 50 periods
predictions = model_fit.predict(start=len(train_data), end=len(train_data)+len(test_data)-1, dynamic=False)

# --- 5. Evaluate the Forecast ---
plt.figure(figsize=(12, 6))
plt.plot(train_data.index[-200:], train_data.values[-200:], label='Training Data')
plt.plot(test_data.index, test_data.values, label='Actual Values (Test)', color='green')
plt.plot(predictions.index, predictions.values, label='Forecasts', color='red', linestyle='--')
plt.title('AR(2) Model Forecast vs Actual')
plt.legend()
plt.show()

Part 6: Applications in Quant Finance & ML

6.1 Quantitative Finance: Mean Reversion Strategies

The AR(1) model is the mathematical heart of **mean-reversion** trading strategies. A quant might test a time series of the spread between two correlated stocks (e.g., Coke and Pepsi). If they fit an AR(1) model to this spread and find a statistically significant negative coefficient ( $\hat{\phi}_1 < 0$ ), it's evidence of mean reversion.

This means that when the spread widens (a positive $Y_{t-1}$ ), it is expected to narrow in the next period. The trading strategy would be to sell the outperforming stock and buy the underperforming one, betting that the spread will revert to its long-run mean. The speed of this reversion is directly related to the magnitude of $\phi_1$ .

6.2 Machine Learning: A Powerful Baseline

In any serious machine learning forecasting project, a simple AR model serves as a critical **baseline model**. Before building complex models like LSTMs or Gradient Boosting Machines, a good data scientist will first build a simple AR(p) model.

The performance of this AR model (e.g., its Mean Squared Error on a test set) becomes the benchmark. If your complex, computationally expensive neural network cannot significantly outperform the simple AR baseline, it's a strong sign that either your complex model is poorly tuned or that the underlying process is simple enough that the extra complexity is not warranted. It prevents you from deploying a complicated model that adds no real value.

What's Next? Modeling Shocks

The Autoregressive model has provided us with a powerful way to model the "memory" of a series—how past *values* influence the present value.

But what about the other source of dynamics? What about the random *shocks* or *errors* ( $\epsilon_t$ )? Can the shocks from the past also have a direct influence on the present value?

In the next lesson, we will explore this complementary idea by introducing the **Moving Average (MA) Model**, which models the present value as a function of past error terms.

Up Next: Let's Model the Shocks: The MA Model