Lesson 4.2: The Performance Review: R-Squared and Residuals

We've built the OLS model. Now we must judge its performance. This lesson introduces the single most famous statistic in data analysis, R-Squared (R²), by proving the fundamental identity of variance decomposition. We'll also explore the powerful geometric properties of the OLS residuals.

Part 1: The Pie of Total Variation

We've found the "best" possible line, but "best" does not mean "good." If the data is a random cloud, our line is useless. We need a way to measure how much of the "story" in our dependent variable (YY) is actually told by our model.

The 'Pie Chart' Analogy

Imagine the total "variation" in your Y variable is a giant pie. This total variation is measured by the **Total Sum of Squares (TSS)**, which is how much the data points vary around their own average, Yˉ\bar{Y}.

TSS=i=1n(YiYˉ)2\text{TSS} = \sum_{i=1}^n (Y_i - \bar{Y})^2

Our OLS model's job is to "eat" as much of this pie as it can. The slice it eats is the **Explained Sum of Squares (ESS)**. The slice left over is the **Sum of Squared Residuals (SSR)**.

The big question is: What percentage of the total pie did our model successfully explain?

Part 2: R-Squared and the Variance Decomposition

This leads us to the definition of the most famous metric in statistics.

Definition: R-Squared (R²), The Coefficient of Determination

R2R^2 is the proportion of the total variation in YY (the whole pie) that is explained by our regression model (the slice we ate).

R2=Explained VariationTotal Variation=ESSTSSR^2 = \frac{\text{Explained Variation}}{\text{Total Variation}} = \frac{\text{ESS}}{\text{TSS}}

This definition only makes sense if the "pie" can be perfectly split. We must now prove the fundamental identity of OLS: that the total variation is *exactly* the sum of the explained and unexplained parts.

Proof: The Variance Decomposition Identity (TSS = ESS + SSR)

Step 1: Decompose the total deviation. We start with the total deviation for one data point, YiYˉY_i - \bar{Y}, and cleverly add and subtract our predicted value, Y^i\hat{Y}_i.

YiYˉ=(YiY^i)+(Y^iYˉ)Y_i - \bar{Y} = (Y_i - \hat{Y}_i) + (\hat{Y}_i - \bar{Y})

The first term is the residual (eie_i). So, YiYˉ=ei+(Y^iYˉ)Y_i - \bar{Y} = e_i + (\hat{Y}_i - \bar{Y}).

Step 2: Square and Sum. We square both sides and sum over all observations.

(YiYˉ)2=(ei+(Y^iYˉ))2\sum(Y_i - \bar{Y})^2 = \sum \left( e_i + (\hat{Y}_i - \bar{Y}) \right)^2

Expanding the right side gives (A+B)2=A2+B2+2AB(A+B)^2 = A^2 + B^2 + 2AB:

(YiYˉ)2TSS=(Y^iYˉ)2ESS+ei2SSR+2ei(Y^iYˉ)\underbrace{\sum(Y_i - \bar{Y})^2}_{\text{TSS}} = \underbrace{\sum(\hat{Y}_i - \bar{Y})^2}_{\text{ESS}} + \underbrace{\sum e_i^2}_{\text{SSR}} + 2\sum e_i(\hat{Y}_i - \bar{Y})

Step 3: Prove the cross-product is zero. We need to show that the interaction term, ei(Y^iYˉ)\sum e_i(\hat{Y}_i - \bar{Y}), is zero. This is a magical property of OLS.

eiY^ieiYˉ=eiY^iYˉei\sum e_i \hat{Y}_i - \sum e_i \bar{Y} = \sum e_i \hat{Y}_i - \bar{Y}\sum e_i

From the first-order conditions of our derivation in Lesson 4.1, we know that the sum of the residuals ei=0\sum e_i = 0. So the second term is zero. We can also prove that the residuals are uncorrelated with the predicted values, meaning eiY^i=0\sum e_i \hat{Y}_i = 0. Thus, the entire cross-product term is zero.

Conclusion: We are left with the beautiful identity:

TSS=ESS+SSR\mathbf{TSS} = \mathbf{ESS} + \mathbf{SSR}

The Alternative Formula for R²

Because TSS=ESS+SSR\text{TSS} = \text{ESS} + \text{SSR}, we can express R² in terms of the unexplained variance:

R2=1SSRTSSR^2 = 1 - \frac{\text{SSR}}{\text{TSS}}

This is often how it's calculated in software. It answers: "What percentage of the pie is NOT left over?"

Part 3: The Geometric Properties of OLS Residuals

The Orthogonality Property

The fact that the cross-product term in our proof was zero is no accident. It's a result of the **Orthogonality** property of OLS. In the language of linear algebra, OLS guarantees that:

  1. The vector of residuals (e\mathbf{e}) is orthogonal (uncorrelated) to the vector of predictors (X\mathbf{X}).
  2. The vector of residuals (e\mathbf{e}) is orthogonal (uncorrelated) to the vector of fitted values (y^\mathbf{\hat{y}}).

What this means: OLS perfectly separates the data into two perpendicular components: the "signal" (y^\mathbf{\hat{y}}), which is a linear function of X\mathbf{X}, and the "noise" (e\mathbf{e}), which is completely unrelated to X\mathbf{X}. Your model has extracted every last drop of linear information.

What's Next? Expanding to Multiple Variables

We have now built, derived, and learned how to evaluate the Simple Linear Regression model. This is the complete toolkit for analyzing a relationship between two variables.

But the real world is complex. An asset's return isn't just affected by the market; it might be affected by interest rates, oil prices, and currency fluctuations. An exam score isn't just affected by study hours; it's affected by sleep, prior GPA, and attendance.

In the next lesson, we will upgrade our engine from a single predictor to handle multiple predictors at once by introducing the **Multiple Linear Regression (MLR) model in Matrix Form**.