Hypergeometric Distribution
Modeling the probability of successes in a sample drawn without replacement.
The Hypergeometric distribution is used when you are sampling from a finite population without replacement. This is a key difference from the Binomial distribution, where each trial is independent because you are "replacing" after each draw.
The classic example is drawing cards from a deck. If you draw a 5-card hand, what's the probability of getting exactly 2 spades? In finance, this can model credit risk in a portfolio of bonds: if you have a portfolio of 100 bonds and know that 5 will default, what is the probability that if you randomly select 10 bonds for an audit, exactly 1 of them will be a defaulter?
Core Concepts
- The numerator calculates the number of ways to achieve the desired outcome: it's the number of ways to choose `k` success items from the `K` available successes (), multiplied by the number of ways to choose the remaining `n-k` failure items from the total `N-K` failures ().
- The denominator is the total number of possible outcomes: the number of ways to choose any `n` items from the total population of `N` ().
Expected Value (Mean)
The mean is intuitive: it's the sample size `n` multiplied by the initial proportion of successes in the population, `K/N`.
Variance
The variance is similar to the Binomial variance, but includes a "finite population correction factor" to account for the fact that each draw is not independent and reduces the remaining population.
Key Derivations
Step 1: Define Indicator Variables
Let be the total number of successes in our sample of size . We can express as the sum of indicator variables:
Where is 1 if the -th item drawn is a success, and 0 otherwise.
Step 2: Use Linearity of Expectation
A powerful property of expectation is that it is linear. This means the expectation of a sum is the sum of the expectations:
Step 3: Find the Expectation of a Single Draw
For any single draw , what is the probability that it is a success? Since every item in the population has an equal chance of being selected in the -th draw, the probability is simply the proportion of successes in the initial population.
The expected value of an indicator variable is just its probability of being 1. Therefore:
Crucially, this is true for every single draw, from the first to the last, even though the draws are not independent.
Step 4: Sum the Expectations
Now we substitute this back into our sum. We are adding the same value, , to itself times.
Applications
Quantitative Finance: Quality Control in Algo Trading
A high-frequency trading firm executed 1000 trades in a day. Due to a data feed error, they know that 50 of these trades were based on faulty data. An auditor randomly selects 80 trades for review. The firm's risk officer can use the Hypergeometric distribution to calculate the probability that the auditor finds *exactly* `k=0` faulty trades, or `k > 5` faulty trades, to understand their exposure to penalties.