Hypergeometric Distribution

Modeling the probability of successes in a sample drawn without replacement.

The "Drawing from a Deck" Distribution

The Hypergeometric distribution is used when you are sampling from a finite population without replacement. This is a key difference from the Binomial distribution, where each trial is independent because you are "replacing" after each draw.

The classic example is drawing cards from a deck. If you draw a 5-card hand, what's the probability of getting exactly 2 spades? In finance, this can model credit risk in a portfolio of bonds: if you have a portfolio of 100 bonds and know that 5 will default, what is the probability that if you randomly select 10 bonds for an audit, exactly 1 of them will be a defaulter?

Interactive Hypergeometric Distribution

Adjust the parameters of the population and sample to see how the probabilities change.

Population Size (N): 52

Population Successes (K): 13

Sample Size (n): 5

Mean (

\mu

): 1.25

Core Concepts

Probability Mass Function (PMF)

The PMF gives the probability of getting exactly `k` successes in a sample of size `n`.

P(X=k) = \frac{\binom{K}{k} \binom{N-K}{n-k}}{\binom{N}{n}}

The numerator calculates the number of ways to achieve the desired outcome: it's the number of ways to choose `k` success items from the `K` available successes ( $\binom{K}{k}$ ), multiplied by the number of ways to choose the remaining `n-k` failure items from the total `N-K` failures ( $\binom{N-K}{n-k}$ ).
The denominator is the total number of possible outcomes: the number of ways to choose any `n` items from the total population of `N` ( $\binom{N}{n}$ ).

Mean (

\mu

): 1.25

Expected Value & Variance

Expected Value (Mean)

E[X] = n \cdot \frac{K}{N}

The mean is intuitive: it's the sample size `n` multiplied by the initial proportion of successes in the population, `K/N`.

Variance

Var(X) = n \frac{K}{N} (1 - \frac{K}{N}) \frac{N-n}{N-1}

The variance is similar to the Binomial variance, but includes a "finite population correction factor" $\frac{N-n}{N-1}$ to account for the fact that each draw is not independent and reduces the remaining population.

Key Derivations

Deriving the Expected Value (Mean)

While deriving the mean directly from the PMF is algebraically complex, we can use a more elegant method involving indicator variables.

Step 1: Define Indicator Variables

Let $X$ be the total number of successes in our sample of size $n$ . We can express $X$ as the sum of $n$ indicator variables:

X = X_1 + X_2 + \dots + X_n

Where $X_i$ is 1 if the $i$ -th item drawn is a success, and 0 otherwise.

Step 2: Use Linearity of Expectation

A powerful property of expectation is that it is linear. This means the expectation of a sum is the sum of the expectations:

E[X] = E[X_1 + \dots + X_n] = E[X_1] + E[X_2] + \dots + E[X_n]

Step 3: Find the Expectation of a Single Draw

For any single draw $i$ , what is the probability that it is a success? Since every item in the population has an equal chance of being selected in the $i$ -th draw, the probability is simply the proportion of successes in the initial population.

P(X_i = 1) = \frac{K}{N}

The expected value of an indicator variable is just its probability of being 1. Therefore:

E[X_i] = 1 \cdot P(X_i=1) + 0 \cdot P(X_i=0) = P(X_i=1) = \frac{K}{N}

Crucially, this is true for every single draw, from the first to the last, even though the draws are not independent.

Step 4: Sum the Expectations

Now we substitute this back into our sum. We are adding the same value, $K/N$ , to itself $n$ times.

E[X] = \sum_{i=1}^{n} E[X_i] = \sum_{i=1}^{n} \frac{K}{N} = n \cdot \frac{K}{N}

Final Mean Formula

E[X] = n \frac{K}{N}

Applications

Quantitative Finance: Quality Control in Algo Trading

A high-frequency trading firm executed 1000 trades in a day. Due to a data feed error, they know that 50 of these trades were based on faulty data. An auditor randomly selects 80 trades for review. The firm's risk officer can use the Hypergeometric distribution to calculate the probability that the auditor finds *exactly* `k=0` faulty trades, or `k &gt 5` faulty trades, to understand their exposure to penalties.