The detective work of data science: making decisions under uncertainty.
Think of hypothesis testing as being a data detective. You start with a default assumption, the Null Hypothesis (H₀), which states there is no effect or no difference (e.g., "a new drug has no effect"). Then, you gather evidence (your sample data) to see if you have enough proof to reject that default assumption in favor of an alternative, the Alternative Hypothesis (H₁) (e.g., "the new drug has an effect").
The p-value is the crucial piece of evidence. It's the probability of observing your data (or something even more extreme) if the null hypothesis were actually true. A small p-value (typically < 0.05) suggests that your observed data is very unlikely under the null hypothesis, giving you a reason to reject it.
The type of data you have determines the statistical test you can use. The main fork in the road is between parametric and non-parametric tests.
Compares the means of two groups, assuming normal distribution.
Compares means of large samples (n>30) with known population variance.
Compares the averages of three or more groups.
Compares the variances (spread) of two or more groups.
Measures the linear relationship between two continuous variables.
Analyzes categorical data to find significant relationships.
Alternative to the T-Test when data is not normally distributed.
Alternative to ANOVA for comparing three or more groups.
Alternative to the paired T-Test for repeated measurements.
Measures the monotonic relationship between two ranked variables.
The non-parametric alternative to a repeated-measures ANOVA.
Tests if a sample is drawn from a specific distribution.