Hypergeometric Distribution

Modeling the probability of successes in a sample drawn without replacement.

The "Drawing from a Deck" Distribution

The Hypergeometric distribution is used when you are sampling from a finite population without replacement. This is a key difference from the Binomial distribution, where each trial is independent because you are "replacing" after each draw.

The classic example is drawing cards from a deck. If you draw a 5-card hand, what's the probability of getting exactly 2 spades? In finance, this can model credit risk in a portfolio of bonds: if you have a portfolio of 100 bonds and know that 5 will default, what is the probability that if you randomly select 10 bonds for an audit, exactly 1 of them will be a defaulter?

Interactive Hypergeometric Distribution
Adjust the parameters of the population and sample to see how the probabilities change.
Mean (μ\mu): 1.25

Core Concepts

Probability Mass Function (PMF)
The PMF gives the probability of getting exactly `k` successes in a sample of size `n`.
P(X=k)=(Kk)(NKnk)(Nn)P(X=k) = \frac{\binom{K}{k} \binom{N-K}{n-k}}{\binom{N}{n}}
  • The numerator calculates the number of ways to achieve the desired outcome: it's the number of ways to choose `k` success items from the `K` available successes ((Kk)\binom{K}{k}), multiplied by the number of ways to choose the remaining `n-k` failure items from the total `N-K` failures ((NKnk)\binom{N-K}{n-k}).
  • The denominator is the total number of possible outcomes: the number of ways to choose any `n` items from the total population of `N` ((Nn)\binom{N}{n}).
Mean (μ\mu): 1.25
Expected Value & Variance

Expected Value (Mean)

E[X]=nKNE[X] = n \cdot \frac{K}{N}

The mean is intuitive: it's the sample size `n` multiplied by the initial proportion of successes in the population, `K/N`.

Variance

Var(X)=nKN(1KN)NnN1Var(X) = n \frac{K}{N} (1 - \frac{K}{N}) \frac{N-n}{N-1}

The variance is similar to the Binomial variance, but includes a "finite population correction factor" NnN1\frac{N-n}{N-1} to account for the fact that each draw is not independent and reduces the remaining population.

Key Derivations

Deriving the Expected Value (Mean)
While deriving the mean directly from the PMF is algebraically complex, we can use a more elegant method involving indicator variables.

Step 1: Define Indicator Variables

Let XX be the total number of successes in our sample of size nn. We can express XX as the sum of nn indicator variables:

X=X1+X2++XnX = X_1 + X_2 + \dots + X_n

Where XiX_i is 1 if the ii-th item drawn is a success, and 0 otherwise.

Step 2: Use Linearity of Expectation

A powerful property of expectation is that it is linear. This means the expectation of a sum is the sum of the expectations:

E[X]=E[X1++Xn]=E[X1]+E[X2]++E[Xn]E[X] = E[X_1 + \dots + X_n] = E[X_1] + E[X_2] + \dots + E[X_n]

Step 3: Find the Expectation of a Single Draw

For any single draw ii, what is the probability that it is a success? Since every item in the population has an equal chance of being selected in the ii-th draw, the probability is simply the proportion of successes in the initial population.

P(Xi=1)=KNP(X_i = 1) = \frac{K}{N}

The expected value of an indicator variable is just its probability of being 1. Therefore:

E[Xi]=1P(Xi=1)+0P(Xi=0)=P(Xi=1)=KNE[X_i] = 1 \cdot P(X_i=1) + 0 \cdot P(X_i=0) = P(X_i=1) = \frac{K}{N}

Crucially, this is true for every single draw, from the first to the last, even though the draws are not independent.

Step 4: Sum the Expectations

Now we substitute this back into our sum. We are adding the same value, K/NK/N, to itself nn times.

E[X]=i=1nE[Xi]=i=1nKN=nKNE[X] = \sum_{i=1}^{n} E[X_i] = \sum_{i=1}^{n} \frac{K}{N} = n \cdot \frac{K}{N}
Final Mean Formula
E[X]=nKNE[X] = n \frac{K}{N}

Applications