Lesson 3.8: Applying the Recipe: CIs for Mean and Variance

We now apply the general 'pivotal method' to construct the two most common confidence intervals. We'll see why the t-distribution is the correct tool for the job when estimating a mean, and why the Chi-squared distribution is necessary when estimating variance.

Part 1: Confidence Interval for a Population Mean (μ)

The Real-World Problem: σ is Unknown

In our general derivation, we used the Z-statistic as our pivot. But this pivot, (Xˉμ)/(σ/n)(\bar{X} - \mu) / (\sigma/\sqrt{n}), has a fatal flaw: we almost never know the true population standard deviation, σ\sigma.

The only practical solution is to replace σ\sigma with our sample estimate, ss. As we learned in Module 2, this substitution changes the distribution of our pivot.

Choosing the Right Tool: The t-distribution

When we use the sample standard deviation ss in our pivot, we introduce extra uncertainty into our calculation. The t-distribution, with its "fatter tails," is the distribution designed to account for precisely this extra uncertainty.

The correct pivot for the mean when σ\sigma is unknown is the **t-statistic**:

t=Xˉμs/ntn1t = \frac{\bar{X} - \mu}{s / \sqrt{n}} \sim t_{n-1}
Deriving the t-Interval

We follow the exact same algebraic inversion from the previous lesson, but we use the t-pivot and its critical values, ±tα/2,n1\pm t_{\alpha/2, n-1}, which are slightly wider than the Z-values.

Starting with P(tα/2Xˉμs/ntα/2)=1αP(-t_{\alpha/2} \le \frac{\bar{X} - \mu}{s / \sqrt{n}} \le t_{\alpha/2}) = 1-\alpha and isolating μ\mu leads directly to the final formula.

The t-Confidence Interval for the Population Mean μ

This is the most widely used confidence interval in all of science and industry.

C.I.=[Xˉtα/2,n1sn,Xˉ+tα/2,n1sn]\text{C.I.} = \left[ \bar{X} - t_{\alpha/2, n-1} \cdot \frac{s}{\sqrt{n}}, \quad \bar{X} + t_{\alpha/2, n-1} \cdot \frac{s}{\sqrt{n}} \right]

Part 2: Confidence Interval for a Population Variance (σ²)

To build an interval for variance, we need a different pivot—one that relates our sample variance (s2s^2) to the true population variance (σ2\sigma^2).

Choosing the Right Tool: The Chi-Squared (χ²) Distribution

As we learned in Module 2, the distribution that governs the behavior of sample variance (under normality) is the Chi-Squared distribution.

The correct pivot for the variance is:

(n1)s2σ2χn12\frac{(n-1)s^2}{\sigma^2} \sim \chi^2_{n-1}

A key feature of the χ2\chi^2 distribution is that it is **not symmetric**. This makes the derivation a bit trickier.

Derivation: The Chi-Squared Interval for σ²

Step 1: Define the probability region. Because the distribution is skewed, we need two different critical values to chop off α/2\alpha/2 from each tail.

  • Lower Critical Value: χ1α/2,n12\chi^2_{1-\alpha/2, n-1} (the value with α/2\alpha/2 area to its left).
  • Upper Critical Value: χα/2,n12\chi^2_{\alpha/2, n-1} (the value with α/2\alpha/2 area to its right).
P(χ1α/2,n12(n1)s2σ2χα/2,n12)=1αP\left( \chi^2_{1-\alpha/2, n-1} \le \frac{(n-1)s^2}{\sigma^2} \le \chi^2_{\alpha/2, n-1} \right) = 1-\alpha

Step 2: Isolate σ2\sigma^2. Since σ2\sigma^2 is in the denominator, we must invert all parts of the inequality, which **reverses the direction** of the inequalities.

P(1χ1α/2,n12σ2(n1)s21χα/2,n12)=1αP\left( \frac{1}{\chi^2_{1-\alpha/2, n-1}} \ge \frac{\sigma^2}{(n-1)s^2} \ge \frac{1}{\chi^2_{\alpha/2, n-1}} \right) = 1-\alpha

Step 3: Solve for σ2\sigma^2 and reorder. Multiply by (n1)s2(n-1)s^2 and then flip the expression to the standard format (lower bound on the left).

The Confidence Interval for the Population Variance σ²

C.I.=[(n1)s2χα/2,n12,(n1)s2χ1α/2,n12]\text{C.I.} = \left[ \frac{(n-1)s^2}{\chi^2_{\alpha/2, n-1}}, \quad \frac{(n-1)s^2}{\chi^2_{1-\alpha/2, n-1}} \right]

Important: Notice how the *upper* Chi-squared critical value appears in the *lower* bound of the interval, and the "lower" χ2\chi^2 value forms the *upper* bound. This is a direct result of the inversion in Step 2.

The Payoff: Quantifying Uncertainty in Practice
    • CIs for Regression Coefficients: Every OLS regression output shows a 95% CI for each coefficient β^j\hat{\beta}_j. That interval is calculated using this lesson's t-interval formula: β^j±tcritse(β^j)\hat{\beta}_j \pm t_{crit} \cdot \text{se}(\hat{\beta}_j). It tells you the range of plausible values for the true effect of that variable.
    • CIs for Financial Volatility (σ\sigma): A risk manager needs to know the plausible range for an asset's true volatility. They use the Chi-squared method to find the CI for the variance (σ2\sigma^2), and then simply **take the square root of both ends of the interval** to get the CI for volatility (σ\sigma). This gives a "best case" and "worst case" for risk.

What's Next? From Ranges to Decisions

Confidence intervals are a powerful tool for quantifying our uncertainty about an estimate. They give us a range of plausible values for the truth.

But often, we need to make a firm, binary decision. Is this new drug effective, yes or no? Is this factor's beta equal to zero, yes or no? This requires a more formal decision-making framework.

In the next lesson, we will introduce the language and logic of **Hypothesis Testing**.

Up Next: Let's Make a Decision: Hypothesis Testing