Lesson 4.7: t-Tests for Individual Coefficients
We have our OLS estimates. Now, we must act as judge and jury. This lesson introduces the t-test, the fundamental tool for determining if an estimated coefficient reflects a real, underlying relationship or if it's merely a product of random sampling noise.
Part 1: The Core Question of Inference
We've run our regression and found an estimate, for example, . This number is our "best guess" from our specific sample.
But a different sample would give a different estimate. The fundamental question of inference is:
Is the true, unobservable effect actually zero, and our estimate of 0.5 is just random noise? Or is our estimate reflecting a real, non-zero relationship?
To answer this, we use the formal framework of hypothesis testing.
The Hypothesis Test for a Single Coefficient
- Null Hypothesis (): The "presumption of innocence." We assume the variable has no effect on .
- Alternative Hypothesis (): The claim we seek evidence for. The variable has a statistically significant effect.
Part 2: Constructing the Test Statistic
To test our hypothesis, we need to create a test statistic whose distribution is known when the null hypothesis is true. As we saw in Module 3, the perfect tool for testing a mean when the population variance is unknown is the t-statistic.
The 'Signal-to-Noise' Ratio
The intuition behind the t-statistic is that it forms a signal-to-noise ratio:
A large t-statistic (e.g., > 2) suggests the signal is strong relative to the random noise, making us doubt the null hypothesis of "no effect."
The Signal: This is our point estimate, .
The Noise (Standard Error): The standard error of our estimate, , measures the typical "wobble" or sampling variability of . To calculate it, we need an estimate of the error variance, .
Deriving the Standard Error
1. Estimate Error Variance: Our unbiased estimator for the error variance is:
where is the sample size, is the number of predictors, and is the number of estimated parameters.
2. Estimate the Covariance Matrix of : We take the true variance formula, , and plug in our estimate :
3. Find the Standard Error: The standard error for a single coefficient is the square root of the j-th diagonal element of this estimated matrix.
The t-Statistic for a Single Coefficient
Under the null hypothesis (and the CLM assumptions, including Normality), this statistic follows a t-distribution with degrees of freedom.
Part 3: Making the Decision
We use the p-value associated with our calculated to make our decision, following the logic from Module 3.
"If the p-value is low, the null must go."
- We choose a significance level, (our threshold for "reasonable doubt," usually 0.05).
- We calculate the -value, which is the probability of getting a t-statistic as extreme as ours, if the null were true.
- We compare. If , the result is "statistically significant." The evidence is strong enough to reject the presumption of innocence. We **Reject H₀**.
The Confidence Interval Connection
The t-test is directly linked to the confidence interval for the coefficient:
The two methods are perfectly equivalent:
Rejecting at the 5% significance level is mathematically identical to finding that the 95% confidence interval for **does not contain zero**.
What's Next? Testing the Whole Model
The t-test is our precision tool for examining one coefficient at a time. It tells us if a single variable is a significant predictor.
But what if we want to ask a bigger question? For example, are *any* of our variables useful? Is our entire model better than just predicting the average of Y? Or, are a specific *group* of variables (e.g., all the variables related to company size) jointly significant?
To answer these questions about multiple coefficients at once, we need a different tool: the **F-test**.