Lesson 3.4: Method of Moments (MoM) Estimation

We now learn our first 'recipe' for creating estimators. The Method of Moments (MoM) is the oldest and most intuitive estimation technique. Its principle is simple: what we observe in our sample should mirror what is true in the population. We'll use this idea to derive estimators for the mean and variance.

Part 1: The Principle of Matching Moments

The entire philosophy of the Method of Moments, developed by Karl Pearson, can be summarized in one sentence:

The Core Idea: Match the sample moments (which you can calculate from your data) to the population moments (which are functions of the parameters you want to find), and then solve for the parameters.

This principle is built directly on the Law of Large Numbers, which guarantees that our sample moments are consistent estimators of the population moments.

The Method of Moments (MoM) Recipe

To find kk unknown parameters, you follow three steps:

  1. Step 1: Express Population Moments: Write the first kk population moments (μj=E[Xj]\mu'_j = E[X^j]) as functions of the unknown parameters (θ1,,θk)(\theta_1, \dots, \theta_k).
  2. Step 2: Calculate Sample Moments: Calculate the first kk sample moments (mj=1nXijm'_j = \frac{1}{n}\sum X_i^j) from your data.
  3. Step 3: Equate and Solve: Set the corresponding moments equal to each other (μj=mj\mu'_j = m'_j) to create a system of kk equations, and solve for your kk parameters.

Part 2: MoM in Action - Deriving Estimators

Case 1: Estimating the Mean (μ\mu) of a Normal Distribution

We need to find one parameter (k=1k=1), so we need one moment equation.

  • Population Moment 1: E[X]=μE[X] = \mu
  • Sample Moment 1: 1nXi=Xˉ\frac{1}{n}\sum X_i = \bar{X}
  • Equate and Solve: μ=Xˉ\mu = \bar{X}

Result: The MoM estimator for μ\mu is μ^MoM=Xˉ\hat{\mu}_{MoM} = \bar{X}.

Case 2: Estimating Both Mean (μ\mu) and Variance (σ2\sigma^2)
We need to find two parameters (k=2k=2), so we need two moment equations.

No-Skip Derivation for μ and σ²

Equation 1 (First Moments):

This is the same as above, giving us our first result: μ^MoM=Xˉ\hat{\mu}_{MoM} = \bar{X}.

Equation 2 (Second Moments):

  • Population Moment 2: From the variance identity, E[X2]=Var(X)+(E[X])2=σ2+μ2E[X^2] = \text{Var}(X) + (E[X])^2 = \sigma^2 + \mu^2.
  • Sample Moment 2: 1nXi2\frac{1}{n}\sum X_i^2.
  • Equate: σ2+μ2=1nXi2\sigma^2 + \mu^2 = \frac{1}{n}\sum X_i^2.

Solve the System:

We substitute our first result (μ^=Xˉ\hat{\mu} = \bar{X}) into the second equation:

σ^2+(Xˉ)2=1nXi2\hat{\sigma}^2 + (\bar{X})^2 = \frac{1}{n}\sum X_i^2

Solving for σ^2\hat{\sigma}^2 gives:

σ^MoM2=1nXi2Xˉ2=1n(XiXˉ)2\hat{\sigma}^2_{MoM} = \frac{1}{n}\sum X_i^2 - \bar{X}^2 = \frac{1}{n}\sum(X_i - \bar{X})^2

The Critical Insight

The MoM estimator for the variance is σ^MoM2=1n(XiXˉ)2\hat{\sigma}^2_{MoM} = \frac{1}{n}\sum(X_i - \bar{X})^2.

This is the **biased** estimator we identified in Lesson 3.1! This is our first concrete proof that while MoM is intuitive and easy, it doesn't always produce the "best" (unbiased) estimators.

Report Card: Method of Moments
    • Consistent: YES. Because sample moments converge to population moments (by the WLLN), MoM estimators are generally consistent.
    • Simple to Calculate: YES. The logic is straightforward and usually only involves basic algebra.
    • Unbiased: NOT ALWAYS. As we saw with the variance, MoM estimators can be biased in finite samples.
    • Efficient: RARELY. MoM only uses the first few moments of the data, potentially ignoring valuable information contained in the full shape of the distribution. This often leads to estimators with higher variance than other methods.

What's Next? A More Powerful Engine

The Method of Moments is a great starting point, but its potential for bias and inefficiency means it's not the workhorse of modern statistics. We need a more powerful, more principled method.

In the next lesson, we will introduce the undisputed champion of estimation techniques: **Maximum Likelihood Estimation (MLE)**. MLE uses the entire probability distribution to find the parameter values that make our observed data "most likely," and it produces estimators with outstanding properties.

Up Next: Let's Learn the Champion: Maximum Likelihood Estimation