Lesson 6.2: Testing for Stationarity: The ADF Test

Visual inspection is useful, but quants need rigor. This lesson introduces the Augmented Dickey-Fuller (ADF) test, the standard statistical procedure for formally testing whether a time series is stationary or if it contains a 'unit root' (like a random walk) and needs to be differenced.

Part 1: The Problem of the Random Walk

As we've discussed, most time series models require stationarity. The most common form of non-stationarity in finance is the **random walk**:

Y_t = Y_{t-1} + \epsilon_t

This can be rewritten as an AR(1) model where the coefficient is exactly 1: $Y_t = 1 \cdot Y_{t-1} + \epsilon_t$ . When this coefficient is 1, we say the process has a **unit root**. A unit root process has "infinite memory"—the impact of a shock never dies out—which is the source of its non-stationarity.

The job of a unit root test is to determine if this coefficient is statistically distinguishable from 1.

Part 2: The Dickey-Fuller Test

The original Dickey-Fuller test makes this explicit by transforming the AR(1) equation. Subtract $Y_{t-1}$ from both sides:

Y_t - Y_{t-1} = (\phi_1 - 1) Y_{t-1} + \epsilon_t

\Delta Y_t = \gamma Y_{t-1} + \epsilon_t

where $\gamma = \phi_1 - 1$ . Now, testing if $\phi_1=1$ is the same as testing if $\gamma=0$ .

The Dickey-Fuller Hypotheses

Null Hypothesis (H₀): $\gamma = 0$ . A unit root is present. The series is **non-stationary**.
Alternative Hypothesis (H₁): $\gamma < 0$ . No unit root. The series is **stationary**.

We can estimate this regression using OLS and perform a t-test on $\hat{\gamma}$ . However, under the null hypothesis, the t-statistic does *not* follow a standard t-distribution. It follows a special "Dickey-Fuller distribution," so we must compare our statistic to special critical values.

Part 3: The 'Augmented' Dickey-Fuller (ADF) Test

The basic Dickey-Fuller test assumes the error term $\epsilon_t$ is white noise. In reality, it might be serially correlated. The **Augmented** Dickey-Fuller (ADF) test accounts for this by adding lagged values of the dependent variable ( $\Delta Y_t$ ) to the regression.

The ADF Test Regression

\Delta Y_t = \gamma Y_{t-1} + \delta_1 \Delta Y_{t-1} + \delta_2 \Delta Y_{t-2} + \dots + \epsilon_t

The User's Guide to the ADF Test

In practice, you never have to worry about the details. You just need to know how to use the function and interpret its output.

Run the Test: Use a library function like `adfuller` from `statsmodels`.
Examine the p-value: This is the only number you really need to look at.
Apply the Decision Rule:
- If the **p-value > 0.05**: You **fail to reject** the null hypothesis. Your data has a unit root and is non-stationary. You must difference it.
- If the **p-value <= 0.05**: You **reject** the null hypothesis. Your data is stationary. You are on solid ground and can proceed with modeling.

Part 4: Python Implementation

import pandas as pd
import numpy as np
from statsmodels.tsa.stattools import adfuller

# --- Create a non-stationary random walk ---
np.random.seed(42)
non_stationary_series = pd.Series(np.random.randn(500).cumsum(), name='Random Walk')

# --- Create a stationary series (the first difference) ---
stationary_series = non_stationary_series.diff().dropna()

def perform_adf_test(series, name):
    print(f"--- ADF Test Results for: {name} ---")
    result = adfuller(series)
    print(f'ADF Statistic: {result[0]:.4f}')
    print(f'p-value: {result[1]:.4f}')
    print('Critical Values:')
    for key, value in result[4].items():
        print(f'	{key}: {value:.4f}')
    
    if result[1] <= 0.05:
        print("=> Conclusion: Reject the null hypothesis. The series is stationary.")
    else:
        print("=> Conclusion: Fail to reject the null hypothesis. The series is non-stationary.")

# Test the non-stationary series
perform_adf_test(non_stationary_series, 'Original Random Walk')

print("\n" + "="*50 + "\n")

# Test the stationary series
perform_adf_test(stationary_series, 'Differenced Random Walk')

What's Next? Building the Models

We now have a rigorous, formal procedure for ensuring our data is stationary. We are finally ready to start building forecasting models.

The next lesson, **Classical Models I**, will introduce the full family of ARIMA models. We will see how the ACF and PACF plots we learned about earlier guide our choice of model structure (the `p` and `q` orders), while the ADF test guides our choice of the differencing order (`d`).

The Language of Time Series: Stationarity, Autocorrelation (ACF), and Partial Autocorrelation (PACF)

Classical Models I (The "ARIMA" Family): Autoregressive, Moving Average Models