Lesson 6.1: The Language of Time Series

Welcome to Module 6. We now enter the fourth dimension of data: Time. In this foundational lesson, we will learn the essential vocabulary for describing data that evolves sequentially. We will explore the critical concepts of Stationarity, Autocorrelation (ACF), and Partial Autocorrelation (PACF)—the tools that allow us to 'read' the memory of a time series.

Part 1: The Bedrock - Stationarity

Most classical time series models (like ARMA) require the data to be **stationary**. This is the single most important prerequisite. A time series is stationary if its statistical properties do not change over time.

The Three Conditions for (Weak) Stationarity

Constant Mean: The series fluctuates around a consistent average value. It has no trend.
Constant Variance: The "wiggliness" or volatility of the series is consistent over time.
Constant Autocovariance: The relationship between an observation and its lagged values depends only on the lag, not on the time at which it's observed.

Why does it matter? A model trained on a stationary series has a chance of being useful in the future, because the "rules of the game" are stable. A model trained on a non-stationary series (like a stock price) will simply learn the historical trend, which is useless for predicting future changes.

Part 2: The Detective's Tools - ACF and PACF

Once we have a stationary series (e.g., stock returns, not prices), we need to diagnose its internal structure or "memory." We use two primary tools for this: the ACF and PACF plots.

The Core Analogy: An Echo vs. a Direct Word

Imagine shouting in a canyon.

The **Autocorrelation Function (ACF)** is the full **echo** you hear. It contains your original shout plus its reflections. It measures the *total* correlation between a point and its past values, including all indirect effects.
The **Partial Autocorrelation Function (PACF)** is like a special microphone that only hears the **direct word** traveling from a past point to the present, filtering out all the intermediate echoes. It measures the *direct* correlation after removing the influence of shorter lags.

Reading the Plots

Both plots show correlation values at different lags. The most important feature is the blue shaded area, which represents the **significance boundary**. Any spike that extends beyond this boundary is statistically significant.

The "Fingerprints" for Model Selection:

AR(p) Process (Autoregressive): The ACF tails off slowly, while the PACF cuts off sharply after lag `p`.
MA(q) Process (Moving Average): The ACF cuts off sharply after lag `q`, while the PACF tails off slowly.

What's Next? Formally Testing for Bedrock

We now have the core concepts for analyzing a time series. But simply "eyeballing" a chart to check for stationarity is not rigorous enough for quantitative finance.

In the next lesson, we will learn the formal statistical test for stationarity: the **Augmented Dickey-Fuller (ADF) Test**. This test is the practitioner's definitive tool for determining if a series has a "unit root" and requires differencing before it can be modeled.

Other Clustering Methods: An Overview of DBSCAN and Hierarchical Clustering

Testing for Stationarity: The Augmented Dickey-Fuller (ADF) Test