Lesson 5.2: The Detective's Tools: The ACF and PACF

Now that we have stationary data, how do we diagnose its internal 'memory' structure? This lesson introduces the two essential tools for this task: the Autocorrelation Function (ACF) and the Partial Autocorrelation Function (PACF). These plots are the 'fingerprints' of a time series, and learning to read them is the key to specifying the correct forecasting model.

Part 1: The Core Idea - Measuring 'Self-Correlation'

The concept of correlation measures how two *different* variables move together. **Autocorrelation** is simply the correlation of a time series with a **lagged version of itself**. It answers the question: "How much does the value of my series today correlate with its value yesterday, the day before, and so on?"

We use two different tools to measure this, because they tell us two different stories about the nature of the memory.

The Core Analogy: An Echo vs. a Direct Word

Imagine shouting "HELLO!" in a canyon. You hear a complex series of echoes.

  • The **Autocorrelation Function (ACF)** is like the full, complete recording of the echo you hear. It captures the sound of your direct shout from 1 second ago, but it also captures the reflection of your shout off the far wall, which then bounced off another wall before reaching you. The ACF measures the **total, combined correlation**, including all direct and indirect effects.
  • The **Partial Autocorrelation Function (PACF)** is like having a sophisticated microphone that is programmed to **filter out all the intermediate echoes**. It only measures the correlation between you now and your direct shout from a past moment, removing the influence of all the "bounces" that happened in between.

Part 2: The Autocorrelation Function (ACF)

The ACF plot is a bar chart showing the correlation of the series with its lags. The bar at lag kk shows Corr(Yt,Ytk)\text{Corr}(Y_t, Y_{t-k}).

The ACF Plot: A Visual Guide

Imagine a bar chart. X-axis is 'Lag'. Y-axis is 'Correlation'. A blue shaded area represents the significance boundary. Bars extend up or down from the zero line.

How to Read an ACF Plot:

  1. The y-axis is the correlation value (from -1 to 1).
  2. The x-axis is the lag number (kk).
  3. The bar at lag 0 is always 1 (a series is perfectly correlated with itself). We ignore it.
  4. The blue shaded area is the **significance boundary**. Any bar that extends beyond this boundary represents a statistically significant autocorrelation.

Part 3: The Partial Autocorrelation Function (PACF)

The PACF at lag kk gives the correlation between YtY_t and YtkY_{t-k} after **removing the linear effects of all the intervening lags** (Yt1,Yt2,,Ytk+1Y_{t-1}, Y_{t-2}, \dots, Y_{t-k+1}). It's a measure of the *direct* relationship that isn't explained by shorter-term correlations.

Example: PACF(3)\text{PACF}(3) measures the direct link between today and 3 days ago, after accounting for the influence that day 3 had on day 2, and day 2 had on day 1.

The PACF plot is read in the exact same way as the ACF plot, looking for significant spikes that extend beyond the confidence interval.

Part 4: The 'Fingerprints' - Identifying Model Structure

The real power of these two plots comes from using them together. The patterns of decay and cutoff in the ACF and PACF act as "fingerprints" that help us identify the underlying structure of our time series. This is the heart of the Box-Jenkins methodology we will learn later.

The Two Key 'Fingerprints'

Signature 1: An Autoregressive (AR) Process

  • ACF: Tails off gradually (exponential decay or damped sine wave).
  • PACF: Cuts off sharply after lag pp.

Conclusion: This is the signature of an **AR(p) model**. The PACF tells you the order, pp.

Signature 2: A Moving Average (MA) Process

  • ACF: Cuts off sharply after lag qq.
  • PACF: Tails off gradually.

Conclusion: This is the signature of an **MA(q) model**. The ACF tells you the order, qq.

If both plots tail off, it suggests a mixed **ARMA(p,q) model** is needed.

Part 5: Python Implementation

Plotting ACF/PACF in Python

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.arima_process import ArmaProcess

# --- Generate sample AR(2) and MA(1) data to see the patterns ---
np.random.seed(42)

# AR(2) Process: Y_t = 0.7*Y_{t-1} - 0.2*Y_{t-2} + e_t
ar_params = np.array([1, -0.7, 0.2])
ma_params_ar = np.array([1])
ar_process = pd.Series(ArmaProcess(ar_params, ma_params_ar).generate_sample(nsample=500))

# MA(1) Process: Y_t = 0.6*e_{t-1} + e_t
ma_params_ma = np.array([1, 0.6])
ar_params_ma = np.array([1])
ma_process = pd.Series(ArmaProcess(ar_params_ma, ma_params_ma).generate_sample(nsample=500))

# --- Plot the 'Fingerprints' ---

# For the AR(2) process
fig_ar, ax_ar = plt.subplots(2, 1, figsize=(10, 8))
plot_acf(ar_process, ax=ax_ar[0], title='AR(2) Process - ACF (Tails off)')
plot_pacf(ar_process, ax=ax_ar[1], title='AR(2) Process - PACF (Cuts off at lag 2)')
plt.tight_layout()
plt.show()

# For the MA(1) process
fig_ma, ax_ma = plt.subplots(2, 1, figsize=(10, 8))
plot_acf(ma_process, ax=ax_ma[0], title='MA(1) Process - ACF (Cuts off at lag 1)')
plot_pacf(ma_process, ax=ax_ma[1], title='MA(1) Process - PACF (Tails off)')
plt.tight_layout()
plt.show()

What's Next? Building the Models

We now have our diagnostic tools. We know how to check for stationarity, and if the series is stationary, we know how to use the ACF and PACF plots to diagnose its internal memory structure.

We are finally ready to start building predictive models based on these diagnoses.

In the next lesson, we will introduce our first forecasting model, the **Autoregressive (AR) Model**, and see how it formalizes the idea of regressing a series on its own past.

Up Next: The Building Block of Memory: The AR Model