Lesson 1.1: The Rules of the Game: Kolmogorov's Axioms

We've learned the language to describe possibilities (Events and Sample Spaces). But how do we assign a number—a probability—to them? Can the chance of rain be -50%? Can the chance of a stock going up be 200%? No. This lesson introduces the three simple, unbreakable laws of probability. These axioms are the foundation of all statistics, ensuring our models are logical, coherent, and grounded in reality.

Part 1: The Three Fundamental Laws of Chance

The Core Idea (Analogy): Think of probability as a resource—a "lump" of certainty. You have exactly 1 unit of this certainty to distribute among all possible outcomes in your sample space ( $\Omega$ ). The axioms are the rules for how you're allowed to distribute it.

These rules were solidified by the brilliant mathematician Andrey Kolmogorov. They are elegantly simple but incredibly powerful. Every single probability rule you will ever learn is derived from these three ideas.

1.1 Axiom 1: Non-Negativity

The first rule is simple: you cannot allocate a negative amount of certainty to an event. The lowest possible chance is zero.

Axiom 1: Probabilities Can't Be Negative

For any event $A$ , its probability must be greater than or equal to zero:

P(A) \ge 0

Intuition: This is the "no negative budgets" rule. If someone tells you the chance of a server failing is -20%, their model is broken. An event can be impossible ( $P(A)=0$ ), but it can't be "less than impossible."

1.2 Axiom 2: The Total Budget is 1

The second rule ensures our books are balanced. If you sum up the probabilities of all possible outcomes in the universe ( $\Omega$ ), you must get exactly 1 (or 100%).

Axiom 2: Total Probability Must Be 1

The probability of the entire sample space is one:

P(\Omega) = 1

Intuition: When you roll a die, *something* has to happen. You are 100% certain the outcome will be one of the numbers in $\Omega = \{1,2,3,4,5,6\}$ . This axiom forces all probabilities to live on a consistent scale from 0 to 1.

1.3 Axiom 3: The Addition Rule for Non-Overlapping Events

What if we want to know the probability of one thing OR another thing happening? The third rule tells us that if the two events are mutually exclusive (they can't happen at the same time), we can just add their probabilities together.

Axiom 3: Additivity for Disjoint Events

If $A_1, A_2, \dots$ is a sequence of mutually exclusive events (meaning $A_i \cap A_j = \emptyset$ for $i \neq j$ ), then:

P(A_1 \cup A_2 \cup \dots) = P(A_1) + P(A_2) + \dots

Intuition: Let's go back to our dice roll. Let Event A = "roll a 1" and Event B = "roll a 6". These are disjoint. The probability of rolling "a 1 OR a 6" is simply $P(A) + P(B) = \frac{1}{6} + \frac{1}{6} = \frac{2}{6}$ . There's no overlap, so no need for complicated adjustments.

Part 2: The Magic Show: Deriving Powerful Rules from Just Three Axioms

This is where the magic happens. We can treat the axioms as our only ingredients and logically "prove" all the other rules we need. This isn't just an academic exercise; it shows how robust the foundations of statistics are.

2.1 The Shortcut: The Complement Rule

Often, it's easier to calculate the probability of something *not* happening. The complement rule gives us a powerful shortcut.

Derivation: The Complement Rule

How can we be sure that $P(A^c) = 1 - P(A)$ ?

Step 1 (The Setup): From the last lesson, we know an event $A$ and its complement $A^c$ are, by definition, mutually exclusive. They have no overlap. We also know their union is the entire sample space: $A \cup A^c = \Omega$ .

Step 2 (Apply the Axioms): Let's take the probability of both sides: $P(A \cup A^c) = P(\Omega)$ .

Step 3 (Substitute): By Axiom 3 (Additivity), the left side becomes $P(A) + P(A^c)$ . By Axiom 2 (Total Budget), the right side is $1$ .

Step 4 (Conclusion): So, $P(A) + P(A^c) = 1$ . Rearranging the formula gives us the rule:

P(A^c) = 1 - P(A)

2.2 The Sanity Check: Probabilities are Between 0 and 1

This feels obvious, but can we prove it? Yes, using the axioms.

Derivation: The Probability Range

We need to prove $0 \le P(A) \le 1$ for any event A.

Step 1 (The Lower Bound): Axiom 1 gives us this for free: $P(A) \ge 0$ . Easy.

Step 2 (The Upper Bound): We know from Axiom 1 that the probability of the complement, $P(A^c)$ , must also be non-negative: $P(A^c) \ge 0$ .

Step 3 (Substitute): From our newly derived Complement Rule, we know $P(A^c) = 1 - P(A)$ .

Step 4 (Conclusion): Let's substitute that into Step 2: $1 - P(A) \ge 0$ . A little algebra shows this means $1 \ge P(A)$ .

Combining the lower and upper bounds, the axioms force all probabilities to live in the range: $\mathbf{0 \le P(A) \le 1}$ .

2.3 The Real World: The General Addition Rule (Handling Overlap)

Axiom 3 was for *disjoint* events. But in the real world, events often overlap. If we just add $P(A)$ and $P(B)$ , we double-count the intersection ( $A \cap B$ ). So, we simply subtract the overlap one time to correct for this.

The General Addition Rule

For *any* two events $A$ and $B$ , the probability of their union is:

P(A \cup B) = P(A) + P(B) - P(A \cap B)

Part 3: Why Axioms are a Quant's Best Friend

3.1 The Model Integrity Check in Finance

Any quantitative model that prices assets or estimates risk *must* obey these axioms. If it doesn't, it's garbage.

No-Arbitrage Principle: Axiom 2 is the theoretical foundation for "no arbitrage" (no risk-free profit). If a model's probabilities for all possible future states of the world don't sum to 1, it implies that either some outcomes are unaccounted for or there's a flaw that could be exploited. For example, if your options pricing model suggests probabilities summing to 1.05, it is mathematically broken and must be fixed.
Avoiding Negative Probabilities: If your Monte Carlo simulation (Module 5) ever produces a negative probability for a stock price path, Axiom 1 tells you there's a fundamental bug in your code.

3.2 The Foundation of Machine Learning Inference

The axioms are the bedrock of statistical testing and model evaluation.

Significance Level ( $\alpha$ ): When we set a significance level of $\alpha = 0.05$ , we are defining the probability of a Type I Error. The complement rule, $1-\alpha = 0.95$ , automatically defines the probability of *not* making a Type I error (the confidence level), assuming the null is true. This relationship is a direct result of the axioms.
The Confusion Matrix: In a classification problem, the four outcomes (True Positive, True Negative, False Positive, False Negative) are mutually exclusive events. By Axiom 3, their probabilities must sum to 1. This ensures our model evaluation is exhaustive and complete.

What's Next? Handling Dependent Events

We now have the language (sets) and the rules (axioms) to calculate probabilities for simple and combined events.

The most interesting questions in finance and ML are about how events influence each other. What is the probability a stock crashes, *given that the market is already down*? What is the probability an email is spam, *given that it contains the word 'free'*?

The next lesson, Conditional Probability, gives us the tool to answer these crucial, real-world questions.

Lesson 1.0: The Language of Possibility: Sets, Sample Spaces, and Events

Lesson 1.2: Updating Beliefs with Conditional Probability