Multinomial Distribution
A generalization of the Binomial distribution for more than two outcomes.
The Multinomial distribution extends the Binomial distribution to situations with more than two possible outcomes for each trial. While Binomial models the number of successes in a series of 'success/failure' trials, Multinomial models the number of times each of a set of possible outcomes occurs.
For example, instead of just a 'win' or 'loss', a trade could result in a 'big win', 'small win', 'breakeven', 'small loss', or 'big loss'. The Multinomial distribution can calculate the probability of observing a specific count for each of these categories over a series of trades.
| Outcome | Probability (p_i) | Desired Count (x_i) |
|---|---|---|
| Win | ||
| Loss | ||
| Draw |
Core Concepts
- The first term is the **multinomial coefficient**, which counts the number of ways to arrange the outcomes. It's a generalization of the binomial coefficient.
- The second part is the product of the probabilities of achieving the desired count for each outcome category, similar to the Binomial PMF but extended for multiple categories.
Key Derivations
Deriving the Expected Value (Mean)
Step 1: Reduce to a Binomial Problem
Consider the count of a single category, . We can think of each of the trials as a simple Bernoulli trial for this category: either the outcome is category (a "success") or it isn't (a "failure").
The probability of "success" on any given trial is simply . The probability of "failure" is .
Step 2: Apply the Binomial Mean Formula
Since we are now looking at trials with a success probability of , the random variable follows a Binomial distribution: .
The mean of a Binomial distribution is . Therefore:
Deriving the Variance
Step 1: Apply the Binomial Variance Formula
Just as with the mean, we can use the properties of the Binomial distribution we've established for .
The variance of a Binomial distribution is . Substituting for :
Applications
Machine Learning: Text Classification
A "Bag of Words" model in NLP treats a document as a collection of word counts. The Multinomial distribution is the fundamental assumption behind models like Naive Bayes for text classification. It's used to model the probability of observing the word counts in a document, given that the document belongs to a certain category (e.g., 'spam', 'finance', 'sports').