An Introduction to Hypothesis Testing

A practical guide to deciding if your results are a real breakthrough or just random noise.

Introduction: The Two Stories
Ever see a headline like "New Trading Strategy Boosts Profits!" and wonder if it's true? Hypothesis testing is the tool that helps us find out. We'll follow two stories from start to finish.

The Coffee Experiment

We want to know if a new coffee bean actually makes students more alert.

The Trading Algorithm

We need to decide if a new trading algorithm truly performs better than our current one.

Step 1: Ask a Question (The Hypotheses)
We formalize our question into two competing statements. Think of this as a statistical courtroom drama.

On Trial: The Null Hypothesis (H₀)

This is the "skeptic's view" or the "status quo." It assumes there's no real effect or difference.

  • ☕ Coffee (H₀): The new coffee has no effect on alertness. Any difference in scores is just random.
  • 📈 Trading (H₀): The new algorithm is not better than the old one. Any difference in returns is just market noise.

The Challenger: The Alternative Hypothesis (Hₐ)

This is the new idea we're testing. It's the claim we want to see if we have evidence for.

  • ☕ Coffee (Hₐ): The new coffee does increase student alertness.
  • 📈 Trading (Hₐ): The new algorithm does generate higher average returns.
Step 2: Set the Rules (Confidence & Significance)
Before we analyze the data, we must define what counts as "strong enough" evidence. This is deeply related to Confidence Intervals.

We choose how confident we want to be in our conclusion. The standard is 95% confidence. This means we accept there's a 5% risk that we might be wrong. This risk is the Significance Level (Alpha α).

α=1Confidence Level\alpha = 1 - \text{Confidence Level}

For 95% confidence, α = 1 - 0.95 = 0.05

The Bottom Line: Any result with less than a 5% probability of occurring by random chance will be considered "statistically significant."

Step 3: The Verdict (The P-Value)
The p-value is the probability of seeing your data, or something even more extreme, *assuming the null hypothesis (the skeptic's view) is true*.

Case Result: ☕ The Coffee Experiment

Finding: The "new coffee" group had an average alertness score 10 points higher.

P-Value: 0.02

Verdict: There's only a 2% chance we'd see this result if the coffee had no real effect. Since 0.02 is **less than** our 0.05 significance level, we have a winner!

Conclusion: We reject H₀. The evidence suggests the new coffee really does increase alertness.

Case Result: 📈 The Trading Algorithm

Finding: The new algorithm's average daily return was 0.05% higher.

P-Value: 0.25

Verdict: There is a 25% chance we'd see this result even if the new algorithm was no better than the old one. Since 0.25 is much greater than 0.05, the evidence is weak.

Conclusion: We fail to reject H₀. We don't have enough evidence to invest in the new algorithm.

Step 4: Know the Risks (When Your Conclusion is Wrong)
Even with this process, we can still make an error. It's crucial to understand the two types. For an interactive visualization, see our guide on Type I & Type II Errors.