Lesson 1.2: Updating Beliefs with Conditional Probability

This is where probability becomes a tool for prediction. We learn how to update our beliefs when we receive new information. This lesson introduces Conditional Probability—the probability of an event A, GIVEN that an event B has occurred. This single idea is the foundation for everything from medical diagnoses to algorithmic trading and the famous Bayes' Theorem.

Part 1: When New Information Changes the Game

The Core Idea (Analogy): New information shrinks your universe. Before the new information, the denominator of your probability calculation is the entire sample space ( $\Omega$ ). After the new information, the denominator shrinks to become the event that you know has occurred.

1.1 A Visual Intuition

Imagine a simple survey of 100 software engineers. We classify them by their specialty and whether they use Python.

Specialty	Uses Python	Doesn't Use Python	Total
Machine Learning	35	5	40
Web Development	20	40	60
Total	55	45	100

Let's define our events:

Event A: The engineer is in Machine Learning.
Event B: The engineer uses Python.

Question 1 (Prior Probability): What is the probability of randomly selecting an ML engineer? The universe is all 100 engineers.

P(A) = \frac{\text{Total ML Engineers}}{\text{Total Engineers}} = \frac{40}{100} = 0.4

Question 2 (Conditional Probability): Now, someone tells you, "I've already selected an engineer, and I can tell you they use Python." What is the probability they are an ML engineer, given this new information?

Our universe of 100 people just shrank. We are now only looking at the 55 engineers who use Python. Our denominator is no longer 100; it's 55.

P(A | B) = \frac{\text{ML Engineers who use Python}}{\text{Total Engineers who use Python}} = \frac{35}{55} \approx 0.636

Notice that $P(A | B) > P(A)$ . The information that they use Python increased our belief that they are an ML engineer. This is the essence of conditional probability.

Definition: Conditional Probability

Our visual example leads directly to the formal definition. The numerator (35) was the intersection ( $A \cap B$ ), and the new denominator (55) was the condition ( $B$ ).

The conditional probability of Event $A$ , given that Event $B$ has occurred, is:

P(A | B) = \frac{P(A \cap B)}{P(B)} \quad \text{provided } P(B) > 0

1.2 The Chain Rule of Probability (Multiplication Rule)

In practice, we often know the conditional probability and want to find the joint probability ( $A \cap B$ ). By rearranging the formula above, we get one of the most useful tools in statistics.

The General Multiplication Rule

The probability that two events $A$ and $B$ both happen is:

P(A \cap B) = P(A | B) \cdot P(B)

Use Case: What is the probability of drawing two Aces from a deck? It's the probability of drawing the first Ace ( $P(A)$ ) TIMES the probability of drawing a second Ace, GIVEN you already drew the first ( $P(B|A)$ ).

Part 2: When New Information Changes Nothing

What happens if new information is useless? If knowing event $B$ happened tells you absolutely nothing new about the probability of event $A$ , we say the events are statistically independent.

2.1 Defining Independence

The intuitive definition of independence is that the conditional probability is the same as the prior probability:

\text{If } A \text{ and } B \text{ are independent, then } P(A | B) = P(A)

If we plug this simple idea back into our multiplication rule, $P(A \cap B) = P(A | B) \cdot P(B)$ , we get the formal, computational definition:

Definition: Statistical Independence

Two events $A$ and $B$ are independent if and only if:

P(A \cap B) = P(A) \cdot P(B)

Example: The probability of rolling a 6 ( $P(A)=1/6$ ) AND flipping a Head ( $P(B)=1/2$ ). These events don't affect each other. The probability of both happening is $P(A \cap B) = (1/6) \cdot (1/2) = 1/12$ .

Crucial Distinction: Independent vs. Mutually Exclusive

This is a common point of confusion. They are almost opposites!

Mutually Exclusive (Disjoint): The events cannot happen together. If A happens, you know B is impossible. $P(A \cap B) = 0$ . This is a state of strong dependence. Knowing the outcome of one gives you perfect information about the other.
Independent: The events have no connection. Knowing A happened gives you zero information about the outcome of B. $P(A \cap B) = P(A)P(B)$ .

Part 3: The Payoff: Why This is a Pillar of Quant & ML

3.1 The Engine of Linear Regression (Econometrics)

The entire goal of Ordinary Least Squares (OLS) regression (Module 4) is to model a conditional expectation. When you write $Y = \beta_0 + \beta_1 X + \epsilon$ , you are trying to estimate $E[Y | X]$ , the expected value of $Y$ *conditional* on some value of $X$ .

The most critical assumption for OLS to work is that the predictors $\mathbf{X}$ are independent of the error term $\epsilon$ . When this fails (a condition called endogeneity), your model is essentially "cheating" by using information hidden in the error term. This leads to biased and untrustworthy coefficient estimates ( $\hat{\beta}$ ).

3.2 The Foundation of Financial Modeling

Conditional probability and independence are the bread and butter of quantitative finance.

Time Series Analysis: The central question is, "Are stock returns independent over time?" We test for autocorrelation, which is just a measure of dependence between a return at time $t$ and a return at time $t-1$ . If they are dependent, we can build predictive models like ARMA (Module 5). If they are independent (as the Efficient Market Hypothesis suggests), prediction is much harder.
Risk Management: A portfolio's variance is simple to calculate if we assume asset returns are independent. But in a crisis, seemingly independent assets suddenly become highly dependent (their correlation goes to 1). Assuming independence when it isn't true is one of the most famous ways to blow up a hedge fund.

What's Next? Reversing the Conditional

We've mastered the art of calculating $P(A | B)$ —the probability of an effect, given a cause.

But what about the other way around? A patient tests positive (the effect). What is the probability they have the disease (the cause)? The stock market crashed (the effect). What is the probability that it was due to a specific news event (the cause)?

This is the domain of the legendary Bayes' Theorem, which allows us to reverse the conditioning and is the single most important theorem for machine learning. We will build up to it in the next lesson.

The Rules of the Game: Kolmogorov's Axioms

Law of Total Probability and Bayes' Theorem