Lesson 4.8: F-Tests for Joint Hypotheses
The t-test is our scalpel for testing one coefficient at a time. The F-test is our sledgehammer for testing multiple coefficients at once. This lesson introduces the F-test, which allows us to determine if an entire group of variables—or the model as a whole—is statistically significant.
Part 1: The 'Guilt by Association' Problem
The t-test is perfect for asking, "Is this one variable significant?" But what if we want to ask, "Is this group of variables significant together?" For example, in a model predicting stock returns, we might want to test if all our "value" metrics () are jointly zero.
Running three separate t-tests is a bad idea. If we test each at , the probability of getting at least one false positive (a Type I error) across the three tests is much higher than 5%. We need a single test for the joint hypothesis .
The 'Handcuffs' Analogy for Model Comparison
The F-test works by comparing the performance of two models:
- The Restricted Model (R): This is our model with the "handcuffs" on. We force the null hypothesis to be true by removing the variables we are testing.
- The Unrestricted Model (UR): This is our full model with the handcuffs off, including all the variables.
The F-test asks: "Does removing the handcuffs lead to a *statistically significant* improvement in the model's fit (i.e., a significant drop in the Sum of Squared Residuals)?"
Part 2: The F-Statistic
By definition, the SSR of the restricted model will always be greater than or equal to the SSR of the unrestricted model (). The F-statistic measures whether this improvement is large enough to be meaningful.
The F-Statistic Formula
The F-statistic is the ratio of the average improvement in fit per new variable to the average unexplained error in the full model.
- : The number of restrictions in the null hypothesis (the number of variables "handcuffed" to zero). This is the **numerator degrees of freedom**.
- : The degrees of freedom in the unrestricted model. This is the **denominator degrees of freedom**.
Under the null hypothesis, this statistic follows an distribution.
A **large F-statistic** means that removing the handcuffs caused a huge improvement in fit, providing strong evidence against the null hypothesis. We **Reject H₀** and conclude the group of variables is jointly significant.
Part 3: The Most Common F-Test: Overall Model Significance
Every standard regression output includes an F-statistic for the overall model. This is a special case of our joint test.
- Null Hypothesis (H₀): All slope coefficients in the model are simultaneously zero. . (The model has no explanatory power).
- Alternative Hypothesis (H₁): At least one slope coefficient is not zero. (The model is useful).
In this case, the "restricted model" is a model with only an intercept, for which . This leads to a beautiful simplification of the F-statistic in terms of R².
F-Statistic for Overall Significance
This shows that the overall F-test is simply a formal test of whether your model's R-squared is significantly greater than zero.
What's Next? Entering 'Act III'
We have now completed "Act II" of our module. We have built the engine (OLS), installed the dashboard (R², t-stats, F-stats), and proved that it's the 'best' possible engine (BLUE) when the rules are followed.
Now we enter "Act III": Real-World Diagnostics. We will become expert mechanics, learning to diagnose and fix our model when the 'rules'—the Classical Assumptions—are broken. The real world is messy, and a true practitioner knows how to handle these challenges.
Our first diagnostic check is for a common problem with our predictors (): **Multicollinearity**.