An Introduction to Hypothesis Testing

A practical guide to deciding if your results are a real breakthrough or just random noise.

Introduction: The Two Stories

Ever see a headline like "New Trading Strategy Boosts Profits!" and wonder if it's true? Hypothesis testing is the tool that helps us find out. We'll follow two stories from start to finish.

The Coffee Experiment

We want to know if a new coffee bean actually makes students more alert.

The Trading Algorithm

We need to decide if a new trading algorithm truly performs better than our current one.

Step 1: Ask a Question (The Hypotheses)

We formalize our question into two competing statements. Think of this as a statistical courtroom drama.

On Trial: The Null Hypothesis (H₀)

This is the "skeptic's view" or the "status quo." It assumes there's no real effect or difference.

☕ Coffee (H₀): The new coffee has no effect on alertness. Any difference in scores is just random.
📈 Trading (H₀): The new algorithm is not better than the old one. Any difference in returns is just market noise.

The Challenger: The Alternative Hypothesis (Hₐ)

This is the new idea we're testing. It's the claim we want to see if we have evidence for.

☕ Coffee (Hₐ): The new coffee does increase student alertness.
📈 Trading (Hₐ): The new algorithm does generate higher average returns.

Our Goal

Our mission is to see if we have enough evidence in our data to **reject the skeptic's view (H₀)** and accept the new idea (Hₐ). From here, you can dive into a range of specific tests in our Quant's Detective Kit.

Step 2: Set the Rules (Confidence & Significance)

Before we analyze the data, we must define what counts as "strong enough" evidence. This is deeply related to Confidence Intervals.

We choose how confident we want to be in our conclusion. The standard is 95% confidence. This means we accept there's a 5% risk that we might be wrong. This risk is the Significance Level (Alpha α).

\alpha = 1 - \text{Confidence Level}

For 95% confidence, α = 1 - 0.95 = 0.05

The Bottom Line: Any result with less than a 5% probability of occurring by random chance will be considered "statistically significant."

Step 3: The Verdict (The P-Value)

The p-value is the probability of seeing your data, or something even more extreme, *assuming the null hypothesis (the skeptic's view) is true*.

Case Result: ☕ The Coffee Experiment

Finding: The "new coffee" group had an average alertness score 10 points higher.

P-Value: 0.02

Verdict: There's only a 2% chance we'd see this result if the coffee had no real effect. Since 0.02 is **less than** our 0.05 significance level, we have a winner!

Conclusion: We reject H₀. The evidence suggests the new coffee really does increase alertness.

Case Result: 📈 The Trading Algorithm

Finding: The new algorithm's average daily return was 0.05% higher.

P-Value: 0.25

Verdict: There is a 25% chance we'd see this result even if the new algorithm was no better than the old one. Since 0.25 is much greater than 0.05, the evidence is weak.

Conclusion: We fail to reject H₀. We don't have enough evidence to invest in the new algorithm.

Step 4: Know the Risks (When Your Conclusion is Wrong)

Even with this process, we can still make an error. It's crucial to understand the two types. For an interactive visualization, see our guide on Type I & Type II Errors.

Type I Error: The False Alarm 🚨

This happens when you **reject the null hypothesis when it was actually true**. You claimed something special was happening, but it was just a fluke.

Coffee: We buy a massive supply of the "miracle" coffee, but it has no real effect.
Trading: We switch to the new "genius" algorithm and lose money because its past performance was just luck.

Type II Error: The Missed Opportunity 🤦‍♂️

This is the opposite: you **fail to reject the null hypothesis when it was actually false**. You missed a real discovery.

Coffee: We dismiss the new coffee, but it actually worked and we missed out.
Trading: We don't adopt the new algorithm, but it was genuinely better and we missed out on profits.