Lesson 3.9: The Logic of Statistical Decisions
We now move from estimation to decision-making. This lesson introduces the formal 'courtroom' logic of hypothesis testing. We'll define the Null (H₀) and Alternative (H₁) hypotheses, and then explore the two types of errors we can make—Type I (α) and Type II (β)—which represent the fundamental tradeoffs in any decision based on data.
Part 1: The Courtroom of Statistics
The framework of hypothesis testing is a direct parallel to a legal trial. Understanding this analogy is the key to mastering the logic.
The Analogy: Innocent Until Proven Guilty
- The Null Hypothesis (H₀): This is the default assumption of "innocence." It's the statement of no effect, no change, no difference. We presume it's true unless overwhelmed by evidence. (e.g., "This new drug has no effect.")
- The Alternative Hypothesis (H₁): This is the prosecutor's claim. It's what we are trying to find evidence *for*. It's the statement of a real effect. (e.g., "This new drug is effective.")
- The Data: This is the evidence presented in court (DNA, witness testimony, etc.).
- The Verdict: The jury never declares the defendant "innocent." They either **"Reject H₀"** (find them guilty) or **"Fail to Reject H₀"** (find them not guilty).
The Asymmetry of Hypothesis Testing
Our goal is to see if we have enough evidence to challenge the default belief (H₀). We never "accept" H₀ or "prove" H₁.
Our conclusion is always one of two things:
- Reject H₀: The evidence from our sample is so strong that the "no effect" theory looks ridiculous.
- Fail to Reject H₀: The evidence was not strong enough. This doesn't mean H₀ is true, just that we couldn't disprove it. (Absence of evidence is not evidence of absence).
Part 2: The Four Possible Outcomes
Because our decision is based on a random sample, it can be wrong. There are exactly four possible outcomes when we make a decision.
| H₀ is True (Drug is useless) | H₁ is True (Drug is effective) | |
|---|---|---|
| Our Decision: Reject H₀ | Type I Error (α) "False Positive" | Correct Decision (Power) "True Positive" |
| Our Decision: Fail to Reject H₀ | Correct Decision "True Negative" | Type II Error (β) "False Negative" |
Part 3: Defining α, β, and Power
This is the probability of a "false alarm"—convicting an innocent person.
Significance Level (α)
The researcher **chooses** before the experiment (usually 5% or 1%). It sets our tolerance for making a false discovery.
This is the probability of a "missed opportunity"—letting a guilty person go free.
Probability of Type II Error (β)
We don't choose directly. It depends on , the sample size , and the true effect size.
Power is the goal of a good statistical test. It's the probability that our test will correctly detect a real effect when there is one.
Definition: Power
The α / β Tradeoff:
For a fixed sample size, there is a direct tradeoff. If you lower (make it harder to convict), you will inevitably increase (let more guilty people go free), which reduces the power of your test. The only way to improve both errors simultaneously is to collect more data ().
- A/B Testing: A Type I error means launching a new website feature that doesn't actually work (Cost of False Discovery). A Type II error means failing to launch a feature that would have increased revenue (Cost of Missed Opportunity).
- Quantitative Finance: A Type I error means investing real money in a trading strategy that has no real alpha (catastrophic loss). A Type II error means passing on a genuinely profitable strategy (missed profits). Because the cost of a Type I error is so high, quants use very strict levels.
This framework is how businesses and researchers formally manage the risk of making bad decisions based on data.
What's Next? The Mechanics of the Verdict
We've set up the courtroom and defined the possible errors. Now, how does the jury actually reach a verdict? How do we quantify the "strength of the evidence" to decide whether to reject H₀?
In the next lesson, we will learn the practical mechanics of hypothesis testing by defining **test statistics, p-values, and critical regions**.
Up Next: Let's Learn How to Make a Verdict