Lesson 4.8: F-Tests for Joint Hypotheses

The t-test is our scalpel for testing one coefficient at a time. The F-test is our sledgehammer for testing multiple coefficients at once. This lesson introduces the F-test, which allows us to determine if an entire group of variables—or the model as a whole—is statistically significant.

Part 1: The 'Guilt by Association' Problem

The t-test is perfect for asking, "Is this one variable significant?" But what if we want to ask, "Is this group of variables significant together?" For example, in a model predicting stock returns, we might want to test if all our "value" metrics (βP/E,βP/B,βD/Y\beta_{P/E}, \beta_{P/B}, \beta_{D/Y}) are jointly zero.

Running three separate t-tests is a bad idea. If we test each at α=0.05\alpha=0.05, the probability of getting at least one false positive (a Type I error) across the three tests is much higher than 5%. We need a single test for the joint hypothesis H0:βP/E=βP/B=βD/Y=0H_0: \beta_{P/E} = \beta_{P/B} = \beta_{D/Y} = 0.

The 'Handcuffs' Analogy for Model Comparison

The F-test works by comparing the performance of two models:

  • The Restricted Model (R): This is our model with the "handcuffs" on. We force the null hypothesis to be true by removing the variables we are testing.
  • The Unrestricted Model (UR): This is our full model with the handcuffs off, including all the variables.

The F-test asks: "Does removing the handcuffs lead to a *statistically significant* improvement in the model's fit (i.e., a significant drop in the Sum of Squared Residuals)?"

Part 2: The F-Statistic

By definition, the SSR of the restricted model will always be greater than or equal to the SSR of the unrestricted model (SSRRSSRUR\text{SSR}_R \ge \text{SSR}_{UR}). The F-statistic measures whether this improvement is large enough to be meaningful.

The F-Statistic Formula

The F-statistic is the ratio of the average improvement in fit per new variable to the average unexplained error in the full model.

Fstat=(SSRRSSRUR)/qSSRUR/(nk1)F_{\text{stat}} = \frac{(\text{SSR}_R - \text{SSR}_{UR}) / q}{\text{SSR}_{UR} / (n - k - 1)}
  • qq: The number of restrictions in the null hypothesis (the number of variables "handcuffed" to zero). This is the **numerator degrees of freedom**.
  • nk1n - k - 1: The degrees of freedom in the unrestricted model. This is the **denominator degrees of freedom**.

Under the null hypothesis, this statistic follows an Fq,nk1F_{q, n-k-1} distribution.

A **large F-statistic** means that removing the handcuffs caused a huge improvement in fit, providing strong evidence against the null hypothesis. We **Reject H₀** and conclude the group of variables is jointly significant.

Part 3: The Most Common F-Test: Overall Model Significance

Every standard regression output includes an F-statistic for the overall model. This is a special case of our joint test.

  • Null Hypothesis (H₀): All slope coefficients in the model are simultaneously zero. H0:β1=β2==βk=0H_0: \beta_1 = \beta_2 = \dots = \beta_k = 0. (The model has no explanatory power).
  • Alternative Hypothesis (H₁): At least one slope coefficient is not zero. (The model is useful).

In this case, the "restricted model" is a model with only an intercept, for which SSRR=TSS\text{SSR}_R = \text{TSS}. This leads to a beautiful simplification of the F-statistic in terms of R².

F-Statistic for Overall Significance

Fstat=R2/k(1R2)/(nk1)F_{\text{stat}} = \frac{R^2 / k}{(1 - R^2) / (n - k - 1)}

This shows that the overall F-test is simply a formal test of whether your model's R-squared is significantly greater than zero.

What's Next? Entering 'Act III'

We have now completed "Act II" of our module. We have built the engine (OLS), installed the dashboard (R², t-stats, F-stats), and proved that it's the 'best' possible engine (BLUE) when the rules are followed.

Now we enter "Act III": Real-World Diagnostics. We will become expert mechanics, learning to diagnose and fix our model when the 'rules'—the Classical Assumptions—are broken. The real world is messy, and a true practitioner knows how to handle these challenges.

Our first diagnostic check is for a common problem with our predictors (X\mathbf{X}): **Multicollinearity**.