Lesson 3.11: The Theory of the 'Best' Test: The Neyman-Pearson Lemma

We know how to conduct a hypothesis test. But how do we know if our test is the best possible one? This lesson introduces the Neyman-Pearson Lemma, the fundamental theorem that provides a recipe for constructing the 'Most Powerful' test—the test that maximizes our chances of finding a true effect (Power) for a fixed tolerance for false positives (α).

Part 1: The Quest for the Most Powerful Test

In our courtroom analogy, we have a tradeoff: for a fixed standard of evidence (α\alpha), we want to maximize the probability of correctly convicting a guilty person (Power, 1β1-\beta). The Neyman-Pearson (NP) Lemma tells us exactly how to build the test that achieves this.

First, the lemma works in a very specific scenario: a **simple null vs. a simple alternative** hypothesis.

Simple Hypothesis

Specifies an exact value for the parameter. Ex: H0:μ=100H_0: \mu = 100

Composite Hypothesis

Specifies a range of values for the parameter. Ex: H1:μ>100H_1: \mu > 100

Part 2: The Core Weapon: The Likelihood Ratio

The test that the NP Lemma prescribes is built on a simple, powerful idea: the **Likelihood Ratio**.

The 'Strength of Evidence' Score

The Likelihood Ratio, Λ\Lambda, simply asks: "How many times more likely is my observed data under the alternative hypothesis compared to the null hypothesis?"

Definition: The Likelihood Ratio (Λ)

Λ(x)=L(θ1x)L(θ0x)=Likelihood of Data if H1 is TrueLikelihood of Data if H0 is True\Lambda(\mathbf{x}) = \frac{L(\theta_1 | \mathbf{x})}{L(\theta_0 | \mathbf{x})} = \frac{\text{Likelihood of Data if } H_1 \text{ is True}}{\text{Likelihood of Data if } H_0 \text{ is True}}

A large ratio (Λ1\Lambda \gg 1) means our data strongly supports H1H_1. A small ratio (Λ1\Lambda \ll 1) means our data is more consistent with H0H_0.

Part 3: The Neyman-Pearson Lemma

Theorem: The Neyman-Pearson (NP) Lemma

When testing a simple null H0:θ=θ0H_0: \theta = \theta_0 against a simple alternative H1:θ=θ1H_1: \theta = \theta_1, the **Most Powerful (MP) test** at a significance level α\alpha is a Likelihood Ratio Test (LRT).

The test has the following decision rule:

Reject H0H_0 if Λ(x)>k\Lambda(\mathbf{x}) > k

Where the critical value kk is chosen such that the probability of a Type I error is exactly α\alpha:

P(Λ(X)>kθ=θ0)=αP(\Lambda(\mathbf{X}) > k \, | \, \theta = \theta_0) = \alpha

This theorem is a guarantee: No other test you can possibly invent for this problem will have a higher Power (a lower β\beta) for the same α\alpha level.

What's Next? The Real World of Composite Hypotheses

The Neyman-Pearson Lemma gives us the "best" test for a simple vs. simple hypothesis (e.g., μ=10\mu=10 vs. μ=12\mu=12).

But what about more realistic, *composite* hypotheses, like H0:μ=10H_0: \mu=10 vs. H1:μ>10H_1: \mu > 10? There might not be a single test that is "most powerful" for every possible value in the alternative (e.g., for μ=10.1\mu=10.1 and for μ=25\mu=25).

In our final theoretical lesson, we will generalize the Likelihood Ratio Test and introduce **Wilks' Theorem**, which gives us a powerful, universal method for testing composite hypotheses with large samples.

Up Next: Generalizing the LRT: Wilks' Theorem