Lesson 2.7: The F-Distribution (Fisher-Snedecor)

We now meet the final, and most general, of the sampling distributions. The F-distribution is the ultimate tool for comparing sources of variance. We'll derive it as a ratio of two Chi-Squared distributions, which makes it the perfect instrument for testing the overall significance of a model or the joint significance of a group of variables.

Part 1: The Signal-to-Noise Ratio

The t-test is perfect for judging a single variable. But what if we want to answer a bigger question, like "Is my entire model useful?" or "Does adding this group of 3 new features significantly improve my predictions?"

To answer this, we need to compare how much variance our new features *explain* (the signal) against the variance they *don't* explain (the noise). The F-distribution is the tool that governs this comparison.

The Core Idea: The F-distribution is the distribution of a ratio of two independent Chi-Squared variables. Think of it as a "signal-to-noise" distribution.

Definition: The F-Distribution

Let Uχk12U \sim \chi^2_{k_1} and Vχk22V \sim \chi^2_{k_2} be two independent Chi-Squared random variables with k1k_1 and k2k_2 degrees of freedom, respectively.

The random variable FF defined below follows an F-distribution with (k1,k2)(k_1, k_2) degrees of freedom:

F=U/k1V/k2Fk1,k2F = \frac{U / k_1}{V / k_2} \sim F_{k_1, k_2}

Part 2: Properties of the F-Distribution

The Two Parameters: Numerator & Denominator df

The shape of the F-distribution is uniquely determined by two different degrees of freedom parameters:

  • k1k_1: The **numerator degrees of freedom** (related to the "signal").
  • k2k_2: The **denominator degrees of freedom** (related to the "noise").

Imagine a plot showing several right-skewed F-distribution curves, labeled with their (k1, k2) df pairs, like F(3, 20) or F(5, 50).

Like the Chi-Squared, the F-distribution is always non-negative and skewed to the right.

Relationship to Other Distributions

The F-distribution elegantly connects to the t-distribution.

(tk)2F1,k(t_k)^2 \sim F_{1, k}

This shows that a t-test on a single coefficient is just a special case of an F-test where you are testing only one restriction (k1=1k_1=1).

Part 3: The Connection to Regression Models

The F-statistic you see in every regression output is a direct application of this definition.

Deriving the OLS F-Statistic

When we test if a group of qq variables is jointly significant, we are comparing a restricted model (R) to an unrestricted model (UR).

Step 1: Define the Signal and Noise.

  • Signal (U): The reduction in squared errors from adding the qq variables. We know SSRRSSRURσ2χq2\frac{SSR_R - SSR_{UR}}{\sigma^2} \sim \chi^2_q. So U=SSRRSSRURσ2U = \frac{SSR_R - SSR_{UR}}{\sigma^2} and k1=qk_1=q.
  • Noise (V): The remaining squared errors in the full model. We know SSRURσ2χnp2\frac{SSR_{UR}}{\sigma^2} \sim \chi^2_{n-p}. So V=SSRURσ2V = \frac{SSR_{UR}}{\sigma^2} and k2=npk_2=n-p (where p is total predictors).

Step 2: Construct the F-ratio.

F=U/k1V/k2=((SSRRSSRUR)/σ2)/q(SSRUR/σ2)/(np)F = \frac{U / k_1}{V / k_2} = \frac{ ( (SSR_R - SSR_{UR}) / \sigma^2 ) / q }{ ( SSR_{UR} / \sigma^2 ) / (n-p) }

Step 3: The σ2\sigma^2 terms cancel. This is the magic! We don't need to know the true error variance to calculate the statistic.

Fstat=(SSRRSSRUR)/qSSRUR/(np)F_{stat} = \frac{(SSR_R - SSR_{UR}) / q}{SSR_{UR} / (n-p)}

This final formula is exactly the F-statistic used for joint hypothesis testing.

The Payoff: The Arbitrator of Models

    The F-test is the primary tool for making decisions about the structure of a model.

    • Overall Model Significance: The F-statistic reported at the top of every regression output tests the null hypothesis that all slope coefficients are jointly zero (H0:β1=β2==0H_0: \beta_1 = \beta_2 = \dots = 0). It answers the question, "Is this model, as a whole, better than just predicting the mean?"
    • Feature Selection: In machine learning, the F-test is used to decide if adding a group of new features (e.g., adding polynomial terms or interaction effects) provides a statistically significant improvement in model fit, helping to prevent overfitting.
    • Testing Economic Theories: In finance, the F-test is used to test complex hypotheses, such as the Capital Asset Pricing Model (CAPM) by checking if a group of "alpha" factors are jointly equal to zero.

What's Next? The Magic of Large Numbers

Congratulations! You have now met the entire family of sampling distributions (χ², t, and F) that form the foundation of classical statistical inference.

But all of these rely on a strong, often-violated assumption: that our data comes from a Normal distribution. What happens in the real world when our data is skewed or has weird properties? Can we still do statistics?

The answer is a resounding YES, thanks to the magic of **Asymptotic Theory**. The next part of our module introduces the two most powerful theorems in all of statistics: the Law of Large Numbers and the Central Limit Theorem.