Lesson 3.1: Judging Estimators: The Property of Unbiasedness

We've defined an estimator as a 'recipe' for a guess. Now, we define the first and most important criterion for a good recipe: Unbiasedness. We will rigorously define bias, prove the unbiasedness of the sample mean, and uncover the subtle bias in the sample variance, leading to the famous 'n-1' correction.

Part 1: The First Criterion of a Good Estimator

If an estimator is our "recipe" for guessing a true parameter, our first question should be simple: "On average, does this recipe give us the right answer?" If the answer is yes, we call the estimator unbiased.

The Dartboard Analogy

Imagine the true parameter θ\theta is the bullseye of a dartboard. Every time you collect a sample and calculate an estimate, you throw a dart.

Unbiased & High Variance

Your darts are scattered all around the bullseye. Your aim is good on average, but your hand is shaky.

Biased & Low Variance

Your darts are tightly clustered, but they are centered on the '20' instead of the bullseye. Your hand is steady, but your aim is off.

Definition: Unbiasedness & Bias

An estimator θ^\hat{\theta} is unbiased if its expected value (the average of all possible estimates it could produce) is equal to the true parameter θ\theta.

E[θ^]=θE[\hat{\theta}] = \theta

The Bias of an estimator is the difference between its expectation and the truth.

Bias(θ^)=E[θ^]θ\text{Bias}(\hat{\theta}) = E[\hat{\theta}] - \theta

An estimator is unbiased if and only if its bias is zero.

Part 2: Interrogating Our Estimators

Case 1: The Sample Mean (Xˉ\bar{X})

Proof: The Sample Mean is Unbiased

We want to prove that E[Xˉ]=μE[\bar{X}] = \mu.

Step 1: Apply the definition of expectation to the estimator's formula.

E[Xˉ]=E[1ni=1nXi]E[\bar{X}] = E\left[ \frac{1}{n}\sum_{i=1}^n X_i \right]

Step 2: Use the linearity of expectation. We can pull constants and sums out of the expectation.

E[Xˉ]=1ni=1nE[Xi]E[\bar{X}] = \frac{1}{n} \sum_{i=1}^n E[X_i]

Step 3: Use the i.i.d. sample assumption. Each XiX_i is a random draw from the *same* population, so by definition, E[Xi]=μE[X_i] = \mu for all ii.

E[Xˉ]=1ni=1nμ=1n(nμ)E[\bar{X}] = \frac{1}{n} \sum_{i=1}^n \mu = \frac{1}{n} (n \cdot \mu)

Step 4: Conclude.

E[Xˉ]=μE[\bar{X}] = \mu

Verdict: The sample mean is a perfectly unbiased estimator of the population mean.

Case 2: The "Naive" Sample Variance (s~2\tilde{s}^2)
Let's test the intuitive but flawed estimator for variance where we divide by n.

Proof: The Naive Sample Variance is Biased

Let's find the expectation of s~2=1n(XiXˉ)2\tilde{s}^2 = \frac{1}{n} \sum (X_i - \bar{X})^2.

Step 1: A useful trick. Add and subtract μ\mu inside the square.

(XiXˉ)2=((Xiμ)(Xˉμ))2\sum(X_i - \bar{X})^2 = \sum((X_i - \mu) - (\bar{X} - \mu))^2

Step 2: Expand the square.

=[(Xiμ)22(Xiμ)(Xˉμ)+(Xˉμ)2]= \sum[(X_i - \mu)^2 - 2(X_i - \mu)(\bar{X} - \mu) + (\bar{X} - \mu)^2]

Step 3: Distribute the summation and simplify. (Note that (Xiμ)=n(Xˉμ)\sum(X_i - \mu) = n(\bar{X} - \mu))

=(Xiμ)22n(Xˉμ)2+n(Xˉμ)2=(Xiμ)2n(Xˉμ)2= \sum(X_i - \mu)^2 - 2n(\bar{X} - \mu)^2 + n(\bar{X} - \mu)^2 = \sum(X_i - \mu)^2 - n(\bar{X} - \mu)^2

Step 4: Take the expectation of the whole expression.

E[(XiXˉ)2]=E[(Xiμ)2]nE[(Xˉμ)2]E[\sum(X_i - \bar{X})^2] = E[\sum(X_i - \mu)^2] - n E[(\bar{X} - \mu)^2]

By definition, E[(Xiμ)2]=Var(Xi)=σ2E[(X_i - \mu)^2] = \text{Var}(X_i) = \sigma^2 and E[(Xˉμ)2]=Var(Xˉ)=σ2/nE[(\bar{X} - \mu)^2] = \text{Var}(\bar{X}) = \sigma^2/n.

=σ2n(σ2/n)=nσ2σ2=(n1)σ2= \sum \sigma^2 - n(\sigma^2/n) = n\sigma^2 - \sigma^2 = (n-1)\sigma^2

Step 5: Find the final expectation for our estimator.

E[s~2]=E[1n(XiXˉ)2]=1n(n1)σ2=(n1n)σ2E[\tilde{s}^2] = E\left[\frac{1}{n}\sum(X_i-\bar{X})^2\right] = \frac{1}{n} (n-1)\sigma^2 = \left(\frac{n-1}{n}\right)\sigma^2

Verdict: E[s~2]σ2E[\tilde{s}^2] \ne \sigma^2. The naive estimator is biased, systematically underestimating the true variance.

Bessel's Correction: The n-1 Fix

To fix the bias, we multiply our biased result by n/(n1)n/(n-1). This leads to the unbiased sample variance estimator, s2s^2, which you should always use:

s2=1n1i=1n(XiXˉ)2s^2 = \frac{1}{n-1}\sum_{i=1}^n (X_i - \bar{X})^2

Part 3: Asymptotic Unbiasedness

Sometimes, an estimator is biased in small samples (like s~2\tilde{s}^2), but the bias *disappears* as the sample size nn grows to infinity. This is called asymptotic unbiasedness.

Definition: Asymptotic Unbiasedness

An estimator θ^\hat{\theta} is asymptotically unbiased if the limit of its expected value is the true parameter:

limnE[θ^n]=θ\lim_{n \to \infty} E[\hat{\theta}_n] = \theta

Our biased variance estimator s~2\tilde{s}^2 is a perfect example. Its expectation is (n1n)σ2(\frac{n-1}{n})\sigma^2. As nn \to \infty, the fraction (n1n)(\frac{n-1}{n}) goes to 1, so the bias vanishes.

Part 4: Connecting to the Real World (ML & Finance)

The Machine Learning Connection: The Bias-Variance Trade-off

This lesson is the absolute heart of the Bias-Variance Trade-off, the single most important concept in applied machine learning.

  • Our OLS Estimator (Module 2): The Gauss-Markov theorem proved that OLS is BLUE (Best Linear **Unbiased** Estimator). We intentionally chose an estimator with zero bias.
  • The Trade-off: In ML, we often find that strictly enforcing zero bias (like OLS) leads to estimators with very high **variance** (they are "shaky" and overfit the sample data).
  • Regularization (Ridge/Lasso): ML techniques like Ridge and Lasso *intentionally introduce a small amount of bias* into the OLS estimates. Why? Because doing so can *dramatically* reduce the estimator's variance. This results in a model that is technically "biased" but makes far more accurate and stable predictions on new, unseen data.
The Quant Finance Connection: Estimation Risk vs. Model Bias

In quantitative finance, "bias" means your model is fundamentally wrong.

  • Model Bias: If your risk model (e.g., CAPM) is biased, it means your estimate of an asset's Beta (β^\hat{\beta}) is, on average, wrong. For example, E[β^]=1.1E[\hat{\beta}] = 1.1 when the true β\beta is 0.8. This is a critical failure that will lead to incorrect hedging and massive losses. This is why we rely on unbiased estimators like OLS.
  • Estimation Risk: This is the "shakiness" of our unbiased estimator (its variance). Unbiasedness (Lesson 3.1) and Efficiency (Lesson 3.2) are the two components of "estimation risk." Our goal as quants is to find the estimator with the lowest possible estimation risk (i.e., the one that is BLUE).

What's Next? Shaky Hands vs. Steady Hands

Unbiasedness tells us if our aim is correct on average. But it doesn't tell us how "shaky" our hands are. Two different estimators can both be unbiased, but one might produce estimates that are wildly scattered while the other produces estimates that are tightly clustered around the bullseye.

In the next lesson, we will introduce the second key property of a good estimator: **Efficiency**. This is the formal measure of an estimator's variance, or "steadiness."