Multinomial Distribution

A generalization of the Binomial distribution for more than two outcomes.

Beyond Success or Failure

The Multinomial distribution extends the Binomial distribution to situations with more than two possible outcomes for each trial. While Binomial models the number of successes in a series of 'success/failure' trials, Multinomial models the number of times each of a set of possible outcomes occurs.

For example, instead of just a 'win' or 'loss', a trade could result in a 'big win', 'small win', 'breakeven', 'small loss', or 'big loss'. The Multinomial distribution can calculate the probability of observing a specific count for each of these categories over a series of trades.

Interactive Multinomial Calculator

Specify the parameters to calculate the probability of a specific result. Since the full distribution is multi-dimensional, we calculate the probability for a specific outcome vector.

Total Number of Trials (n): 10

Outcome	Probability (p_i)	Desired Count (x_i)
Win
Loss
Draw

Core Concepts

Probability Mass Function (PMF)

The PMF gives the probability of observing *exactly* a specific set of counts for each of the `c` possible outcomes.

P(X_1=x_1, ..., X_c=x_c) = \frac{n!}{x_1! \cdots x_c!} p_1^{x_1} \cdots p_c^{x_c}

The first term is the **multinomial coefficient**, which counts the number of ways to arrange the outcomes. It's a generalization of the binomial coefficient.
The second part is the product of the probabilities of achieving the desired count for each outcome category, similar to the Binomial PMF but extended for multiple categories.

Key Derivations

Deriving the Mean and Variance

The moments for the count of a single category,

X_i

, can be found by cleverly reducing the problem to a Binomial distribution.

Deriving the Expected Value (Mean)

Step 1: Reduce to a Binomial Problem

Consider the count of a single category, $X_i$ . We can think of each of the $n$ trials as a simple Bernoulli trial for this category: either the outcome is category $i$ (a "success") or it isn't (a "failure").

The probability of "success" on any given trial is simply $p_i$ . The probability of "failure" is $1 - p_i$ .

Step 2: Apply the Binomial Mean Formula

Since we are now looking at $n$ trials with a success probability of $p_i$ , the random variable $X_i$ follows a Binomial distribution: $X_i \sim \text{Binomial}(n, p_i)$ .

The mean of a Binomial distribution is $np$ . Therefore:

Final Mean Formula for

X_i

E[X_i] = np_i

Deriving the Variance

Step 1: Apply the Binomial Variance Formula

Just as with the mean, we can use the properties of the Binomial distribution we've established for $X_i$ .

The variance of a Binomial distribution is $np(1-p)$ . Substituting $p_i$ for $p$ :

Final Variance Formula for

X_i

Var(X_i) = np_i(1 - p_i)

Applications

Machine Learning: Text Classification

A "Bag of Words" model in NLP treats a document as a collection of word counts. The Multinomial distribution is the fundamental assumption behind models like Naive Bayes for text classification. It's used to model the probability of observing the word counts in a document, given that the document belongs to a certain category (e.g., 'spam', 'finance', 'sports').