Lesson 2.7: The F-Distribution (Fisher-Snedecor)

We now meet the final, and most general, of the sampling distributions. The F-distribution is the ultimate tool for comparing sources of variance. We'll derive it as a ratio of two Chi-Squared distributions, which makes it the perfect instrument for testing the overall significance of a model or the joint significance of a group of variables.

Part 1: The Signal-to-Noise Ratio

The t-test is perfect for judging a single variable. But what if we want to answer a bigger question, like "Is my entire model useful?" or "Does adding this group of 3 new features significantly improve my predictions?"

To answer this, we need to compare how much variance our new features *explain* (the signal) against the variance they *don't* explain (the noise). The F-distribution is the tool that governs this comparison.

The Core Idea: The F-distribution is the distribution of a ratio of two independent Chi-Squared variables. Think of it as a "signal-to-noise" distribution.

Definition: The F-Distribution

Let $U \sim \chi^2_{k_1}$ and $V \sim \chi^2_{k_2}$ be two independent Chi-Squared random variables with $k_1$ and $k_2$ degrees of freedom, respectively.

The random variable $F$ defined below follows an F-distribution with $(k_1, k_2)$ degrees of freedom:

F = \frac{U / k_1}{V / k_2} \sim F_{k_1, k_2}

Part 2: Properties of the F-Distribution

The Two Parameters: Numerator & Denominator df

The shape of the F-distribution is uniquely determined by two different degrees of freedom parameters:

$k_1$ : The **numerator degrees of freedom** (related to the "signal").
$k_2$ : The **denominator degrees of freedom** (related to the "noise").

Imagine a plot showing several right-skewed F-distribution curves, labeled with their (k1, k2) df pairs, like F(3, 20) or F(5, 50).

Like the Chi-Squared, the F-distribution is always non-negative and skewed to the right.

Relationship to Other Distributions

The F-distribution elegantly connects to the t-distribution.

(t_k)^2 \sim F_{1, k}

This shows that a t-test on a single coefficient is just a special case of an F-test where you are testing only one restriction ( $k_1=1$ ).

Part 3: The Connection to Regression Models

The F-statistic you see in every regression output is a direct application of this definition.

Deriving the OLS F-Statistic

When we test if a group of $q$ variables is jointly significant, we are comparing a restricted model (R) to an unrestricted model (UR).

Step 1: Define the Signal and Noise.

Signal (U): The reduction in squared errors from adding the $q$ variables. We know $\frac{SSR_R - SSR_{UR}}{\sigma^2} \sim \chi^2_q$ . So $U = \frac{SSR_R - SSR_{UR}}{\sigma^2}$ and $k_1=q$ .
Noise (V): The remaining squared errors in the full model. We know $\frac{SSR_{UR}}{\sigma^2} \sim \chi^2_{n-p}$ . So $V = \frac{SSR_{UR}}{\sigma^2}$ and $k_2=n-p$ (where p is total predictors).

Step 2: Construct the F-ratio.

F = \frac{U / k_1}{V / k_2} = \frac{ ( (SSR_R - SSR_{UR}) / \sigma^2 ) / q }{ ( SSR_{UR} / \sigma^2 ) / (n-p) }

Step 3: The $\sigma^2$ terms cancel. This is the magic! We don't need to know the true error variance to calculate the statistic.

F_{stat} = \frac{(SSR_R - SSR_{UR}) / q}{SSR_{UR} / (n-p)}

This final formula is exactly the F-statistic used for joint hypothesis testing.

The Payoff: The Arbitrator of Models

The F-test is the primary tool for making decisions about the structure of a model.

Overall Model Significance: The F-statistic reported at the top of every regression output tests the null hypothesis that all slope coefficients are jointly zero ( $H_0: \beta_1 = \beta_2 = \dots = 0$ ). It answers the question, "Is this model, as a whole, better than just predicting the mean?"
Feature Selection: In machine learning, the F-test is used to decide if adding a group of new features (e.g., adding polynomial terms or interaction effects) provides a statistically significant improvement in model fit, helping to prevent overfitting.
Testing Economic Theories: In finance, the F-test is used to test complex hypotheses, such as the Capital Asset Pricing Model (CAPM) by checking if a group of "alpha" factors are jointly equal to zero.

What's Next? The Magic of Large Numbers

Congratulations! You have now met the entire family of sampling distributions (χ², t, and F) that form the foundation of classical statistical inference.

But all of these rely on a strong, often-violated assumption: that our data comes from a Normal distribution. What happens in the real world when our data is skewed or has weird properties? Can we still do statistics?

The answer is a resounding YES, thanks to the magic of **Asymptotic Theory**. The next part of our module introduces the two most powerful theorems in all of statistics: the Law of Large Numbers and the Central Limit Theorem.

Lesson 2.6: The t-Distribution (Student's t)

Lesson 2.8: The Law of the Average: The WLLN