Lesson 3.11: The Theory of the 'Best' Test: The Neyman-Pearson Lemma
We know how to conduct a hypothesis test. But how do we know if our test is the best possible one? This lesson introduces the Neyman-Pearson Lemma, the fundamental theorem that provides a recipe for constructing the 'Most Powerful' test—the test that maximizes our chances of finding a true effect (Power) for a fixed tolerance for false positives (α).
Part 1: The Quest for the Most Powerful Test
In our courtroom analogy, we have a tradeoff: for a fixed standard of evidence (), we want to maximize the probability of correctly convicting a guilty person (Power, ). The Neyman-Pearson (NP) Lemma tells us exactly how to build the test that achieves this.
First, the lemma works in a very specific scenario: a **simple null vs. a simple alternative** hypothesis.
Specifies an exact value for the parameter. Ex:
Specifies a range of values for the parameter. Ex:
Part 2: The Core Weapon: The Likelihood Ratio
The test that the NP Lemma prescribes is built on a simple, powerful idea: the **Likelihood Ratio**.
The 'Strength of Evidence' Score
The Likelihood Ratio, , simply asks: "How many times more likely is my observed data under the alternative hypothesis compared to the null hypothesis?"
Definition: The Likelihood Ratio (Λ)
A large ratio () means our data strongly supports . A small ratio () means our data is more consistent with .
Part 3: The Neyman-Pearson Lemma
When testing a simple null against a simple alternative , the **Most Powerful (MP) test** at a significance level is a Likelihood Ratio Test (LRT).
The test has the following decision rule:
Reject if
Where the critical value is chosen such that the probability of a Type I error is exactly :
This theorem is a guarantee: No other test you can possibly invent for this problem will have a higher Power (a lower ) for the same level.
What's Next? The Real World of Composite Hypotheses
The Neyman-Pearson Lemma gives us the "best" test for a simple vs. simple hypothesis (e.g., vs. ).
But what about more realistic, *composite* hypotheses, like vs. ? There might not be a single test that is "most powerful" for every possible value in the alternative (e.g., for and for ).
In our final theoretical lesson, we will generalize the Likelihood Ratio Test and introduce **Wilks' Theorem**, which gives us a powerful, universal method for testing composite hypotheses with large samples.
Up Next: Generalizing the LRT: Wilks' Theorem