The Central Limit Theorem (CLT)

The surprising and powerful idea that the average of many random things is not random at all, but is in fact predictable.

Order from Chaos

The Central Limit Theorem is one of the most magical ideas in all of statistics. In essence, it states that if you take a sufficiently large number of random samples from *any* population (no matter how weirdly shaped its distribution is) and calculate the average (or mean) of each sample, the distribution of those averages will be approximately a Normal (bell-shaped) distribution.

Imagine you have a barrel full of tickets with numbers written on them. The numbers could be completely random (Uniform distribution), mostly small numbers with a few huge ones (a skewed distribution), or anything else. The CLT says that if you repeatedly:

  1. Pull out a handful of tickets (a sample, e.g., n=30).
  2. Calculate the average of that handful.
  3. Write down the average and put the tickets back.

...and you do this thousands of times, the histogram of the averages you wrote down will form a beautiful, clean bell curve. This is true even if the original numbers in the barrel had a completely different-looking histogram!

The Math Behind the Magic

Let X1,X2,...,XnX_1, X_2, ..., X_n be a sequence of independent and identically distributed (i.i.d.) random variables with population mean μ\mu and finite variance σ2\sigma^2. Let Xˉn=1ni=1nXi\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i be the sample mean. The CLT states that as nn \to \infty, the distribution of the standardized sample mean approaches a standard normal distribution:

Xˉnμσ/ndN(0,1)\frac{\bar{X}_n - \mu}{\sigma/\sqrt{n}} \xrightarrow{d} N(0, 1)

This tells us two amazing things about the distribution of the sample means:

  • The mean of the sample means will be the same as the original population mean (μ\mu).
  • The standard deviation of the sample means (called the "standard error") will be the original population's standard deviation divided by the square root of the sample size (σ/n\sigma/\sqrt{n}).
The Laboratory
Adjust the parameters and run the simulation to see the CLT in action.
30
1000
50
1. Population Distribution
This is the shape of the original barrel of tickets. We'll draw samples from here.
Population Mean (μ\mu): 5.00,Population Std Dev (σ\sigma): 2.89
2. Current Sample
This is the histogram of a single handful of tickets drawn from the population, and its calculated mean.
3. Distribution of Sample Means
This is the histogram of the *averages* from each handful drawn. Watch as it forms a bell curve!
Mean of Sample Means (xˉ\bar{x}): 0.000
Std. Dev. of Sample Means (Std. Error): 0.000

Total Averages Collected: 0 / 1000