Lesson 3.3: Consistency and Sufficiency

We complete our study of estimator properties with two more crucial concepts. Consistency is the 'big data guarantee'—the assurance that our estimator improves as we get more data. Sufficiency is a powerful idea about whether an estimator uses all the information available in a sample, forming the basis for data compression and feature engineering.

Part 1: Consistency - The Guarantee of Big Data

We've discussed properties that hold for any sample size (unbiasedness, efficiency). Now we ask: what happens as our sample size nn goes to infinity? This is the domain of asymptotic properties.

The Core Idea: A **consistent** estimator is one that is guaranteed to converge to the true parameter value as the sample size grows. It's the mathematical proof that "more data is better."

Definition: Consistency

An estimator θ^n\hat{\theta}_n is a **consistent estimator** of θ\theta if it converges in probability to θ\theta.

plimn(θ^n)=θ\text{plim}_{n \to \infty} (\hat{\theta}_n) = \theta

This means the sampling distribution of the estimator collapses into a single spike right on top of the true parameter as nn grows.

A Simple Test for Consistency

Proving convergence in probability directly can be hard. A simpler sufficient condition is to check two things:

  1. Does the estimator's bias go to zero? (limnBias(θ^n)=0\lim_{n \to \infty} \text{Bias}(\hat{\theta}_n) = 0)
  2. Does the estimator's variance go to zero? (limnVar(θ^n)=0\lim_{n \to \infty} \text{Var}(\hat{\theta}_n) = 0)

If both are true, the estimator is consistent.

Example (Sample Mean): We know E[Xˉ]=μE[\bar{X}]=\mu (bias is 0) and Var(Xˉ)=σ2/n\text{Var}(\bar{X}) = \sigma^2/n. Since lim(σ2/n)=0\lim (\sigma^2/n) = 0, the sample mean is a consistent estimator for μ\mu. This is the Law of Large Numbers in action.

Part 2: Sufficiency - The Art of Data Compression

Imagine you have a dataset with a million data points. Is it possible to summarize that entire dataset into just one or two numbers without losing *any information* about the parameter you want to estimate? If so, those summary numbers are called **sufficient statistics**.

The Core Idea: A statistic is **sufficient** if it contains all the information about the parameter that was present in the original sample. Once you have the sufficient statistic, the original data is redundant.

Example: The Perfect Summary

Let X1,,XnX_1, \dots, X_n be nn coin flips (Bernoulli trials) to estimate the probability of heads, pp.

The statistic T(X)=XiT(\mathbf{X}) = \sum X_i, the total number of heads, is a **sufficient statistic** for pp.

Why? If I tell you I got 60 heads in 100 flips, you can estimate p^=0.6\hat{p}=0.6. Knowing the exact sequence (e.g., HTH... vs HHH...) provides no extra information about pp. The sum is a lossless compression of the data for the purpose of estimating pp.

The Rao-Blackwell Theorem: The Path to the Best Estimator

Why do we care about sufficiency? Because it's the key to finding the most efficient estimator (the MVUE).

The Rao-Blackwell theorem provides a method: if you have a simple unbiased estimator, you can improve it (reduce its variance) by conditioning it on a sufficient statistic. This process "squeezes out" the irrelevant noise, leaving you with a more efficient estimator that only uses the information that matters.

The Payoff: Why These Properties are Foundational
    • Consistency (The Justification for Big Data): Consistency is the theoretical bedrock of modern machine learning and quantitative finance. It is the formal guarantee that our models and backtests will, with enough data, converge to the true underlying reality. Without consistency, "big data" would be useless.
    • Sufficiency (The Justification for Feature Engineering): Sufficiency is the formal theory behind data reduction. Every time an ML engineer creates a "feature" (like an average, a sum, or a ratio) from raw data, they are implicitly creating a statistic. The goal of good feature engineering is to create statistics that are (as close as possible to) sufficient, capturing the essential information in a much lower dimension.

What's Next? Time to Build

We have now completed our tour of the "Big Four" properties of good estimators: **Unbiasedness, Efficiency, Consistency, and Sufficiency.**

We know how to judge a recipe. It's finally time to learn how to cook.

In the next lesson, we will begin Act II of this module and learn our first systematic method for actually creating estimators from scratch: the simple and intuitive **Method of Moments (MoM)**.

Up Next: Act II: How to Build an Estimator with Method of Moments