Lesson 1.3: Law of Total Probability and Bayes' Theorem

We learned to calculate the probability of an outcome A given a condition B (P(A|B)). This lesson teaches us how to reverse that logic using Bayes' Theorem to find the probability of the cause, given the effect (P(B|A)). We first use the Law of Total Probability to calculate the denominator, which is the cornerstone of sequential reasoning and Bayesian statistics.

Part 1: The Law of Total Probability (The Denominator)

1.1 The Intuition

Before we can reverse conditional probability, we need a way to calculate the total, unconditional probability of an event AA, given that AA can happen under several mutually exclusive scenarios.

Imagine an event: Event AA = My stock portfolio goes up tomorrow.

This can happen under two distinct, mutually exclusive scenarios:

  1. Scenario BB: The market is in a Bull Market (70% chance).
  2. Scenario BcB^c: The market is in a Bear Market (30% chance).

The Law of Total Probability says the total probability of AA is the sum of the probabilities of AA happening in each scenario.

Law of Total Probability (LTP)

If BB and BcB^c are complementary events (they partition the sample space), then the probability of any event AA is:

P(A)=P(AB)+P(ABc)P(A) = P(A \cap B) + P(A \cap B^c)

Using the Multiplication Rule (P(AB)=P(AB)P(B)P(A \cap B) = P(A | B) \cdot P(B)):

P(A)=P(AB)P(B)+P(ABc)P(Bc)P(A) = P(A | B) \cdot P(B) + P(A | B^c) \cdot P(B^c)

1.2 The General Case (Partitioning the Space)

If the sample space is partitioned into kk mutually exclusive events (B1,B2,,BkB_1, B_2, \dots, B_k), the LTP is:

P(A)=i=1kP(ABi)P(Bi)P(A) = \sum_{i=1}^{k} P(A | B_i) \cdot P(B_i)

This allows us to calculate the marginal probability P(A)P(A) using conditional probabilities, which is the entire denominator of Bayes' Theorem.

Part 2: Bayes' Theorem (Reversing the Condition)

We now have the tools to solve the ultimate inference problem: reversing the condition.

  • Given (Easy): We usually know the probability of the Data (AA) given the Cause (BB): P(AB)P(A | B). (E.g., probability of a test being positive given the patient has the disease).
  • Desired (Hard): We want the probability of the Cause (BB) given the Data (AA): P(BA)P(B | A). (E.g., probability the patient has the disease given the test was positive).

Bayes' Theorem links these two using the General Multiplication Rule.

Derivation of Bayes' Theorem (No-Skip)

Step 1: Write the Joint Probability in two ways
The probability of AA and BB happening together is symmetric:

P(AB)=P(BA)P(A \cap B) = P(B \cap A)

Using the Multiplication Rule on the left side:

P(AB)P(B)=P(BA)P(A)P(A | B) \cdot P(B) = P(B | A) \cdot P(A)

Step 2: Isolate the Desired Quantity (P(BA)P(B | A))

P(BA)=P(AB)P(B)P(A)P(B | A) = \frac{P(A | B) \cdot P(B)}{P(A)}

Step 3: Substitute the Law of Total Probability for the Denominator
We substitute the full expression for P(A)P(A) from Part 1.

Bayes' Theorem

P(BA)=P(AB)P(B)i=1kP(ABi)P(Bi)P(B | A) = \frac{P(A | B) \cdot P(B)}{\sum_{i=1}^{k} P(A | B_i) \cdot P(B_i)}

2.1 The Bayesian Terminology

Bayes' Theorem defines how we update our belief after observing data.

  • P(B)P(B): The Prior Probability (Our initial belief about the cause, BB, before seeing the data AA).
  • P(BA)P(B | A): The Posterior Probability (Our updated belief about the cause, BB, *after* seeing the data AA).
  • P(AB)P(A | B): The Likelihood (The probability of seeing the data AA, given that our initial belief BB was correct).

Part 3: Connecting to the Real World (ML and Finance)

3.1 The Machine Learning Connection: Naive Bayes and Classification

Bayes' Theorem is the fundamental theoretical tool for statistical classification.

  • Naive Bayes Classifier: This is a simple, powerful algorithm used for tasks like spam filtering. The classifier is literally programmed to calculate P(ClassFeatures)P(\text{Class} | \text{Features}).
    P(Spam”Free Money”)P(\text{Spam} | \text{''Free Money''})
    It uses Bayes' Theorem to find the probability of the class (Spam) given the feature (word "Free Money"), relying on the historical probabilities (Priors) and the likelihood calculated from the training data.
  • Model Evaluation: The probabilities calculated by the Law of Total Probability are the foundation for evaluating classifiers. The confusion matrix (Lesson 3.9) uses terms like Sensitivity and Specificity, which are all based on reversing conditional probabilities.

3.2 The Quant Finance Connection: Bayesian Statistics

While the core of this curriculum is Frequentist (Module 3/4), Bayesian statistics (which uses Bayes' Theorem as its core principle) is a powerful alternative, especially in finance.

  • Incorporating Priors: A Bayesian analyst starts with a Prior Belief (P(B)P(B)) about a trading strategy's expected return. They then collect data (AA) and use Bayes' Theorem to calculate the Posterior Probability (P(BA)P(B | A)). This method is useful because it allows quants to incorporate human judgment or information from related markets (the Prior) into the final estimate, rather than relying solely on the sample data.
  • Event Risk Modeling: Insurance and risk models use the Law of Total Probability to forecast low-probability events. For example, the total probability of a catastrophic loss (AA) is calculated by summing the loss probability conditional on different, known fault modes (BiB_i: earthquake, power failure, cyberattack).

What's Next? (The Functions of Randomness)

We have established the rules of probability for discrete events (sets, axioms, conditional probability).

In the next lesson, we must transition to dealing with Random Variables. We will formally define the Probability Mass Function (PMF) and the Cumulative Distribution Function (CDF), the tools used to assign probabilities to numerical outcomes.