Lesson 1.12: Thinking in Multiple Dimensions: Joint Distributions
We now make the most important leap in probability theory: from analyzing a single variable (X) to analyzing two or more variables (X, Y) together. This lesson introduces Joint Distributions, the tool for understanding the probability of combined outcomes. Mastering this is the absolute prerequisite for understanding correlation, covariance, and regression.
Part 1: The Probability Landscape
The Core Idea (Analogy): Think of a single variable's distribution as a 2D chart. A joint distribution is a 3D landscape or a topographical map. The location is given by a pair of values (x, y), and the 'altitude' at that point tells you the probability (or probability density) of that specific combination occurring.
Imagine a 3D plot with a 'mountain' over the (x,y) plane. The height of the mountain at any point is the joint probability density.
Whether we are looking at stock returns, user engagement metrics, or medical data, we almost never care about one variable in isolation. We care about the relationship between variables. To do that, we need a function that gives us the probability of AND happening together.
Discrete: Joint PMF (JPMF)
Gives the probability mass at a specific coordinate point .
Continuous: Joint PDF (JPDF)
A surface where the volume under it over a region gives the probability.
The Rules are the Same (Just in Higher Dimensions)
- Non-Negativity: .
- Total Probability is One: The total sum (or total volume) must be 1.
Part 2: Marginal Distributions - Seeing the Shadows of the Landscape
Often, we have the complex joint distribution, but we just want to know the individual distribution of , ignoring . This is called the **marginal distribution**.
In our landscape analogy, finding the marginal distribution of X is like standing on the Y-axis and looking at the "shadow" the entire 3D mountain casts onto the X-Z plane. To get that shadow, we have to flatten the dimension we don't care about.
Definition: The Marginal Distribution ('Integrating Out')
To find the marginal distribution of one variable, we sum (for discrete) or integrate (for continuous) the joint distribution over all possible values of the *other* variable.
Marginal PMF of X
Marginal PDF of X
Example: The Power of a Simple Table
For discrete variables, marginalizing is incredibly intuitive. It's just summing up the rows and columns of a joint probability table.
| Joint PMF | Y = Default | Y = No Default | Marginal of X |
|---|---|---|---|
| X = High Risk | 0.15 | 0.25 | 0.15 + 0.25 = 0.40 |
| X = Low Risk | 0.05 | 0.55 | 0.05 + 0.55 = 0.60 |
| Marginal of Y | 0.15 + 0.05 = 0.20 | 0.25 + 0.55 = 0.80 | 1.00 |
From the table, we can easily see:
- The marginal probability that a borrower is High Risk, regardless of their default status, is .
- The marginal probability that a borrower Defaults, regardless of their risk category, is .
Part 3: The Joint CDF - The Universal View
The Joint Cumulative Distribution Function () extends the CDF concept to higher dimensions. It gives the total accumulated probability in the rectangle from to the point .
Definition: Joint Cumulative Distribution Function (JCDF)
For continuous variables, this is the double integral of the JPDF:
The Calculus Bridge
Just as in the 1D case, the JPDF can be recovered from the JCDF using differentiation. Here, we use a mixed partial derivative:
- Covariance and Correlation: This entire lesson is the required setup for calculating , the key ingredient for covariance. Without the joint distribution, we cannot measure how two variables move together. We will do this in two lessons.
- Advanced Risk Modeling (Copulas): In finance, risk managers often know the marginal distributions of their assets (e.g., the individual risk of stocks and bonds) but don't know their joint distribution. They use advanced functions called **copulas** to "stitch together" the marginals into a valid JPDF, allowing them to model the risk of simultaneous crashes.
- Feature Interaction in ML: When a machine learning model considers two features, it is implicitly trying to understand their joint distribution. The interaction terms in a regression model are an attempt to capture the information contained in the JPDF that isn't present in the marginals alone.
What's Next? Slicing the Landscape
We've learned how to view the entire probability landscape (the Joint Distribution) and how to see its shadows (the Marginal Distributions).
But what if we want to take a "slice" through the landscape? What does the distribution of Y look like, GIVEN that we know X is fixed at a certain value? This is the domain of **Conditional Distributions**, the crucial next step.