Lesson 1.13: Slicing the Probability Landscape

We've built our 3D 'probability map' with joint distributions. Now, we learn how to extract meaningful insights from it. We'll master taking 'shadows' of the map to get Marginal Distributions and taking 'slices' through it to get Conditional Distributions. This is the formal mathematical engine behind all predictive models.

Part 1: The Setup - Our Probability Map

Let's revisit our joint probability table from the last lesson. This table is our entire universe for this example. It represents a credit risk model with two variables: XX, the borrower's risk category, and YY, the outcome of the loan.

Joint PMF pX,Y(x,y)p_{X,Y}(x,y)Y = 0 (Default)Y = 1 (Repaid)Marginal p(x)
X = 0 (High Risk)0.150.250.40
X = 1 (Low Risk)0.050.550.60
Marginal p(y)0.200.801.00

Our goal is to understand how to formally extract the marginal and conditional information from this joint distribution.

Part 2: Marginal Distributions (The Shadows)

The Question: "What's the overall distribution of X, ignoring Y?"
As we saw last lesson, this is found by summing (or integrating) out the other variable. It's like squashing the 3D landscape to see its 2D shadow.

Definition: Marginal Distribution (Review)

To find the marginal of X, we sum the joint probabilities across all values of Y.

pX(x)=all ypX,Y(x,y)p_X(x) = \sum_{\text{all } y} p_{X,Y}(x, y)

From our table: The marginal distribution for the Risk Category (XX) is simply the column of row totals on the right:

  • P(X=High Risk)=0.40P(X=\text{High Risk}) = 0.40
  • P(X=Low Risk)=0.60P(X=\text{Low Risk}) = 0.60

Part 3: Conditional Distributions (The Slices)

The Question: "Given X has a specific value, what is Y's distribution?"
This is the essence of prediction. We're taking a slice through our probability landscape at a known value of X and examining the resulting cross-section for Y.

Remember the fundamental definition of conditional probability: P(AB)=P(AB)/P(B)P(A|B) = P(A \cap B) / P(B). The conditional distribution is a direct extension of this idea.

Definition: Conditional Distribution

The conditional distribution of Y given X=x is the joint distribution divided by the marginal distribution of X.

pYX(yx)=pX,Y(x,y)pX(x)p_{Y|X}(y|x) = \frac{p_{X,Y}(x,y)}{p_X(x)}

For continuous variables, it's the same concept:

fYX(yx)=fX,Y(x,y)fX(x)f_{Y|X}(y|x) = \frac{f_{X,Y}(x,y)}{f_X(x)}

Example: Calculating the Conditional Distribution of Default

Let's find the distribution of Loan Outcome (Y), GIVEN we know the borrower is High Risk (X=0X=0).

  1. Isolate the Slice: We only look at the first row of our table, where X=0. The joint probabilities are 0.15 (for Y=0) and 0.25 (for Y=1).
  2. Find the New Universe: The total probability of this slice is the marginal probability pX(0)=0.40p_X(0) = 0.40. This is our new denominator.
  3. Normalize: We divide each joint probability in the slice by the marginal probability of the slice.
    • P(Y=0X=0)=P(X=0,Y=0)P(X=0)=0.150.40=0.375P(Y=0 | X=0) = \frac{P(X=0, Y=0)}{P(X=0)} = \frac{0.15}{0.40} = 0.375
    • P(Y=1X=0)=P(X=0,Y=1)P(X=0)=0.250.40=0.625P(Y=1 | X=0) = \frac{P(X=0, Y=1)}{P(X=0)} = \frac{0.25}{0.40} = 0.625

The Conditional Distribution

The conditional distribution of Y given X=0 is: {Y=0: 37.5%, Y=1: 62.5%}. Notice this is a valid probability distribution that sums to 1.

Part 4: The Payoff - Conditional Expectation

Why do we care about conditional distributions? Because they allow us to calculate **conditional expectations**. This is the mathematical definition of a prediction.

The conditional expectation E[YX=x]E[Y | X=x] asks: "What is the average value of Y, within the slice where X is fixed at the value x?"

Definition: Conditional Expectation

It's the same expected value formula, but we use the conditional distribution as our probability.

E[YX=x]=all yypYX(yx)E[Y | X=x] = \sum_{\text{all } y} y \cdot p_{Y|X}(y|x)

Let's calculate the expected loan outcome (0=Default, 1=Repaid) for each risk category:

  • For High Risk (X=0): We use the conditional probabilities we just calculated (0.375 and 0.625).
    E[YX=0]=(00.375)+(10.625)=0.625E[Y | X=0] = (0 \cdot 0.375) + (1 \cdot 0.625) = 0.625
  • For Low Risk (X=1): (First, you would calculate P(YX=1)P(Y|X=1) as 0.05/0.600.05/0.60 and 0.55/0.600.55/0.60)
    E[YX=1]=(00.050.60)+(10.550.60)0.917E[Y | X=1] = (0 \cdot \frac{0.05}{0.60}) + (1 \cdot \frac{0.55}{0.60}) \approx 0.917

The expected outcome changes based on the input. This function, E[YX]E[Y|X], is precisely what a linear regression model tries to estimate!

Joint vs. Marginal vs. Conditional
    TypeKey QuestionAnalogy
    JointWhat is the probability of (X=x AND Y=y)?The entire 3D landscape.
    MarginalWhat is the overall probability of (X=x)?The 2D shadow of the landscape.
    ConditionalGiven X=x, what is the probability of (Y=y)?A 2D slice through the landscape.

What's Next? Quantifying the Relationship

We can now fully describe the relationship between two variables using distributions. But it would be useful to have a single number that summarizes the strength and direction of their linear relationship.

The next lesson introduces **Covariance**, a measure of the joint variability of two random variables. It is the crucial ingredient needed to calculate the correlation coefficient and the slope of a regression line.