Lesson 2.5: The Box-Jenkins Methodology

A systematic, iterative process for identifying, estimating, and validating ARIMA models.

The Four Phases of Modeling

The Core Analogy: A Detective Solving a Case

  1. Identification (Gathering Clues): Examine the data, test for stationarity (ADF test), difference if necessary, and then use ACF/PACF plots on the stationary series to form an initial hypothesis about the model's order (p, d, q).
  2. Estimation (Building a Profile): Fit several candidate ARIMA models (e.g., ARIMA(1,1,1), ARIMA(2,1,0)) to the data using Maximum Likelihood Estimation. Compare their AIC/BIC scores to find the best fit.
  3. Diagnostic Checking (Verifying the Theory): Examine the residuals of your best model. They must be white noise. Check the ACF plot of the residuals for any significant spikes and use a formal test like the Ljung-Box test. If the residuals have structure, your model is misspecified and you must return to Step 1.
  4. Forecasting (Predicting the Next Move): Once the model is validated, use it to make out-of-sample forecasts.

What's Next? Modeling Volatility

The ARIMA framework is a complete toolkit for modeling and forecasting the **conditional mean** of a time series.

However, it is built on a crucial assumption that is almost always violated in financial markets: that the variance of the error term, σ2\sigma^2, is constant.

In the next module, we will introduce a new class of models, **ARCH and GARCH**, designed specifically to model this conditional heteroskedasticity, or volatility clustering.