Lesson 3.7: General Construction of Confidence Intervals (CIs)

We now begin 'Act III' of our module: Inference. A single-point estimate is our 'best guess,' but it's almost certainly wrong. In this lesson, we learn how to build a Confidence Interval (CI)—a range of plausible values for the true parameter—and, crucially, how to interpret it correctly.

Part 1: From a Point to a Range

So far, we have focused on finding a **point estimate**, like $\hat{\beta}_1 = 1.2$ . This is our single best guess for the true parameter.

The problem? It gives us no sense of our **precision** or **uncertainty**. Is the true $\beta_1$ likely between 1.1 and 1.3? Or is it between -5.0 and 7.4? Our point estimate $\hat{\beta}_1=1.2$ is the same in both cases, but our confidence in it is vastly different. A Confidence Interval solves this by providing a "margin of error" around our best guess.

Definition: Confidence Interval

A $100(1-\alpha)\%$ Confidence Interval for a parameter $\theta$ is a random interval, calculated from the sample, which contains the true (unknown) population parameter $\theta$ in $100(1-\alpha)\%$ of repeated experiments.

The #1 Most Important Interpretation

The meaning of "95% confidence" is the most misunderstood concept in introductory statistics.

WRONG INTERPRETATION:

"There is a 95% probability that the true mean $\mu$ is inside my calculated interval [10, 20]."

CORRECT INTERPRETATION:

"I am 95% confident in the *method* I used to construct this interval. If I were to draw 100 different samples and construct 100 intervals, I expect that 95 of those intervals would capture the true mean $\mu$ ."

The "Fishing Net" Analogy:

The true parameter $\mu$ is a fixed, stationary fish in a lake. Your confidence interval is a fishing net. A "95% confidence level" means you have a method of throwing the net that will succeed in catching the fish 95% of the time. The probability is in your *method*, not in the location of the fish.

Part 2: The Engine of CIs: The Pivotal Method

How do we construct an interval with this "95% capture rate" property? We need a special tool called a **pivotal quantity**.

What is a Pivot?

A "pivot" is a function of our data and the unknown parameter whose own probability distribution is known and **does not depend on the parameter**.

Example: The Z-statistic is the perfect pivot for the mean $\mu$ (when $\sigma$ is known).

Z = \frac{\bar{X} - \mu}{\sigma / \sqrt{n}}

This quantity follows a $\mathcal{N}(0,1)$ distribution regardless of the true value of $\mu$ . This stability is what allows us to build the interval.

Derivation: Building an Interval from the Pivot

The process is a clever algebraic inversion.

Step 1: Start with a probability statement about the pivot. For a 95% interval, we know 95% of Z-statistics will fall between the critical values -1.96 and +1.96.

P(-1.96 \le Z \le 1.96) = 0.95

Step 2: Substitute the formula for the pivot.

P\left(-1.96 \le \frac{\bar{X} - \mu}{\sigma / \sqrt{n}} \le 1.96\right) = 0.95

Step 3: Isolate the unknown parameter $\mu$ in the middle of the inequality.

Multiply all parts by the standard error:

P\left(-1.96 \cdot \frac{\sigma}{\sqrt{n}} \le \bar{X} - \mu \le 1.96 \cdot \frac{\sigma}{\sqrt{n}}\right) = 0.95

Subtract $\bar{X}$ from all parts:

P\left(-\bar{X} - 1.96 \cdot \frac{\sigma}{\sqrt{n}} \le - \mu \le -\bar{X} + 1.96 \cdot \frac{\sigma}{\sqrt{n}}\right) = 0.95

Multiply by -1 (which flips the direction of the inequalities):

P\left(\bar{X} + 1.96 \cdot \frac{\sigma}{\sqrt{n}} \ge \mu \ge \bar{X} - 1.96 \cdot \frac{\sigma}{\sqrt{n}}\right) = 0.95

Step 4: Rearrange to the standard format.

P\left( \bar{X} - 1.96 \frac{\sigma}{\sqrt{n}} \le \mu \le \bar{X} + 1.96 \frac{\sigma}{\sqrt{n}} \right) = 0.95

This gives us the lower and upper bounds of our 95% confidence interval.

Part 3: The General Recipe for a Confidence Interval

The General Formula for a Confidence Interval

The structure is almost always the same:

\text{Point Estimate} \pm \text{Margin of Error}

\hat{\theta} \pm (\text{Critical Value}) \times (\text{Standard Error of } \hat{\theta})

The Three Ingredients

Point Estimate ( $\hat{\theta}$ ): Your single best guess for the parameter (e.g., $\bar{X}$ , $\hat{\beta}_j$ ). This is the center of your interval.
Standard Error ( $\text{se}(\hat{\theta})$ ): The estimated standard deviation of your estimator's sampling distribution (e.g., $s/\sqrt{n}$ , $\text{se}(\hat{\beta}_j)$ ). This measures the "shakiness" of your estimate. A smaller standard error leads to a narrower, more precise interval.
Critical Value ( $z_{\alpha/2}$ or $t_{\alpha/2, df}$ ): A number from a known distribution (Z or t) that determines your level of confidence. A higher confidence level (e.g., 99% vs 95%) requires a larger critical value, resulting in a wider interval. This is the "confidence dial."

What's Next? Applying the Recipe

We've now mastered the general theory of how to build a confidence interval using the pivotal method.

In the next lesson, we will apply this general recipe to the two most important parameters we deal with: the population mean ( $\mu$ ) and the population variance ( $\sigma^2$ ). We will derive their specific CI formulas, paying close attention to which pivot (Z, t, or Chi-Squared) is the right tool for each job.

Up Next: Let's Apply the Recipe: Deriving CIs for Mean and Variance