Lesson 3.11: The Theory of the 'Best' Test: The Neyman-Pearson Lemma

We know how to conduct a hypothesis test. But how do we know if our test is the best possible one? This lesson introduces the Neyman-Pearson Lemma, the fundamental theorem that provides a recipe for constructing the 'Most Powerful' test—the test that maximizes our chances of finding a true effect (Power) for a fixed tolerance for false positives (α).

Part 1: The Quest for the Most Powerful Test

In our courtroom analogy, we have a tradeoff: for a fixed standard of evidence ( $\alpha$ ), we want to maximize the probability of correctly convicting a guilty person (Power, $1-\beta$ ). The Neyman-Pearson (NP) Lemma tells us exactly how to build the test that achieves this.

First, the lemma works in a very specific scenario: a **simple null vs. a simple alternative** hypothesis.

Simple Hypothesis

Specifies an exact value for the parameter. Ex: $H_0: \mu = 100$

Composite Hypothesis

Specifies a range of values for the parameter. Ex: $H_1: \mu > 100$

Part 2: The Core Weapon: The Likelihood Ratio

The test that the NP Lemma prescribes is built on a simple, powerful idea: the **Likelihood Ratio**.

The 'Strength of Evidence' Score

The Likelihood Ratio, $\Lambda$ , simply asks: "How many times more likely is my observed data under the alternative hypothesis compared to the null hypothesis?"

Definition: The Likelihood Ratio (Λ)

\Lambda(\mathbf{x}) = \frac{L(\theta_1 | \mathbf{x})}{L(\theta_0 | \mathbf{x})} = \frac{\text{Likelihood of Data if } H_1 \text{ is True}}{\text{Likelihood of Data if } H_0 \text{ is True}}

A large ratio ( $\Lambda \gg 1$ ) means our data strongly supports $H_1$ . A small ratio ( $\Lambda \ll 1$ ) means our data is more consistent with $H_0$ .

Part 3: The Neyman-Pearson Lemma

Theorem: The Neyman-Pearson (NP) Lemma

When testing a simple null $H_0: \theta = \theta_0$ against a simple alternative $H_1: \theta = \theta_1$ , the **Most Powerful (MP) test** at a significance level $\alpha$ is a Likelihood Ratio Test (LRT).

The test has the following decision rule:

Reject $H_0$ if $\Lambda(\mathbf{x}) > k$

Where the critical value $k$ is chosen such that the probability of a Type I error is exactly $\alpha$ :

P(\Lambda(\mathbf{X}) > k \, | \, \theta = \theta_0) = \alpha

This theorem is a guarantee: No other test you can possibly invent for this problem will have a higher Power (a lower $\beta$ ) for the same $\alpha$ level.

What's Next? The Real World of Composite Hypotheses

The Neyman-Pearson Lemma gives us the "best" test for a simple vs. simple hypothesis (e.g., $\mu=10$ vs. $\mu=12$ ).

But what about more realistic, *composite* hypotheses, like $H_0: \mu=10$ vs. $H_1: \mu > 10$ ? There might not be a single test that is "most powerful" for every possible value in the alternative (e.g., for $\mu=10.1$ and for $\mu=25$ ).

In our final theoretical lesson, we will generalize the Likelihood Ratio Test and introduce **Wilks' Theorem**, which gives us a powerful, universal method for testing composite hypotheses with large samples.

Up Next: Generalizing the LRT: Wilks' Theorem