Lesson 2.2: The Multivariate Normal Distribution (MVN)
We now generalize the bell curve to handle multiple, correlated variables at once. This lesson introduces the Multivariate Normal (MVN) distribution, defined by its mean vector and the all-important Covariance Matrix (Σ). Mastering the MVN is the key to understanding portfolio risk, factor models, and the theoretical underpinnings of linear regression.
Part 1: From a Single Number to a System
In the real world, variables don't live in isolation. A company's stock return is related to the overall market's return. A person's income is related to their years of education. To model these systems, we need more than a single mean () and a single variance ().
The Core Idea: The Multivariate Normal distribution describes a system of variables by replacing the single and with a mean vector () and a covariance matrix ().
Definition: The Multivariate Normal (MVN) Distribution
A vector of random variables, , is said to follow an MVN distribution if:
This is a vector that simply lists the individual mean of each variable in the system.
This matrix is the engine of the MVN. It stores all the information about the spread and linear relationships in the system.
Part 2: Deconstructing the Covariance Matrix
Anatomy of the Covariance Matrix (Σ)
For a 2-variable system (), the covariance matrix is a simple 2x2 matrix:
- The Diagonal (Top-Left to Bottom-Right): Contains the individual **variances** of each variable. This is the "risk" of each component in isolation.
- The Off-Diagonals: Contains the **covariances** between pairs of variables. This captures the "interaction risk" or how the variables move together.
- Symmetry: Since , the matrix is always symmetric around its main diagonal.
Imagine a contour plot of an ellipse. A positive covariance (off-diagonal) would make the ellipse tilt from bottom-left to top-right.
Part 3: The Three Superpowers of the MVN
The MVN is the workhorse of classical statistics because it has three incredibly convenient mathematical properties.
Any linear combination of the elements of an MVN vector is also Normally distributed (either univariate or multivariate). If , then:
Payoff: This is why portfolio returns () and OLS estimators () are assumed to be Normal.
If a vector of variables is jointly MVN, then the individual distribution of any single variable within that vector is a simple univariate Normal.
Payoff: This allows us to look at the individual t-statistic for a single regression coefficient () even though it was estimated as part of a large multivariate system.
If you take a "slice" through an MVN distribution (i.e., you fix the value of one variable), the resulting distribution of the other variable is still Normal.
Payoff: The mean of this conditional distribution, , turns out to be a linear function of . This is the formal mathematical justification for the entire framework of linear regression.
For general random variables, we proved that .
For random variables that are jointly **Multivariate Normal**, this relationship becomes an "if and only if" condition.
MVN:
This is a massive simplification. In finance, if we assume asset returns are MVN, we only need to check their correlation. If it's zero, we can treat them as fully independent, which dramatically simplifies risk models.
What's Next? Performing Surgery on the MVN
We've defined the MVN and its powerful properties. But how do we use this structure to answer specific questions, like "What is the distribution of just the first two assets in my 10-asset portfolio?" or "Given the market went up today, what is the new expected return for my stock?"
The next lesson gives us the precise formulas for this 'statistical surgery'. We will derive the exact forms of the **Marginal and Conditional Distributions** of the MVN, a crucial theoretical step before we can apply these ideas.