Lesson 4.6: The Gauss-Markov Theorem and the BLUE Property
This is the most important theoretical result in classical econometrics. We will rigorously prove that, under the Classical Linear Model assumptions, the Ordinary Least Squares (OLS) estimator is the 'Best Linear Unbiased Estimator' (BLUE). This theorem is the fundamental justification for why OLS is the default method for linear modeling.
Part 1: The OLS Assumptions (The Rules of the Game)
The Gauss-Markov Theorem only holds if a set of assumptions about the true error term () and the data () are true. These are often called the classical assumptions.
The Five Gauss-Markov Assumptions (MLR Matrix Form)
- Linearity in Parameters: The true relationship is linear: .
- Random Sample: The data () is a random sample of size .
- No Perfect Collinearity: must have full column rank, meaning the matrix () is invertible.
- Zero Conditional Mean (Exogeneity): . The error term is, on average, zero for any value of the predictors.
- Homoskedasticity & No Autocorrelation: The variance of the errors is constant (), and the errors are uncorrelated with each other. This is expressed in matrix form as:
The theorem states that if assumptions 1 through 5 hold, then OLS is the Best Linear Unbiased Estimator (BLUE).
Part 2: Proof of Unbiasedness (The "U" in BLUE)
An estimator is unbiased if its expected value is equal to the true population parameter .
If an estimator is unbiased, it means that if you could run your experiment many times, the average of all the estimated vectors would converge exactly to the true vector .
We start with the OLS master formula and substitute the true model () into it.
Proof that E[β̂_OLS] = β
Step 1: Substitute the True Model
Step 2: Expand and Simplify
Since is the identity matrix :
Step 3: Take the Expected Value
We take the expected value of both sides, conditional on , Since is assumed fixed (non-stochastic) in this context, we can pull it outside the expectation operator .
Since is a fixed, non-random vector of parameters:
Step 4: Apply Assumption 4 (Exogeneity)
By the OLS assumption 4 (Zero Conditional Mean, ):
The OLS estimator is therefore unbiased.
Part 3: Proof of Efficiency (The "B" in BLUE)
The second part of the theorem proves that OLS is the Best estimator. "Best" means it has the smallest variance (the tightest distribution) among all other linear unbiased estimators.
Let's define any other linear estimator, , as a linear combination of the dependent variable vector :
For to be linear and unbiased, the matrix must satisfy certain conditions. For this proof, we define such that it differs from the OLS coefficient matrix by some non-zero matrix :
where the first term is the OLS term .
We must first find the constraint that must satisfy to keep unbiased.
Constraint for Unbiasedness
We require .
Since :
For , we must have .
Substituting the definition of :
Since :
The constraint on the alternative estimator is that the difference matrix must satisfy .
The variance-covariance matrix of an estimator is .
Variance of OLS ()
From Step 2.2, we know .
Applying Assumption 5 ():
Variance of the OLS Estimator
Variance of the Alternative Estimator ()
Under the classical assumptions, .
Expanding the product:
The cross-product terms are zero because .
The variance equation simplifies to:
We substitute :
The difference between the variance of the alternative estimator and the OLS estimator is . Since is a non-zero matrix and is a positive semi-definite matrix (its elements are greater than or equal to zero), the variance of the OLS estimator is smaller than or equal to the variance of any other linear unbiased estimator.
The Gauss-Markov Theorem
Under the five classical OLS assumptions, the Ordinary Least Squares (OLS) estimator is the Best Linear Unbiased Estimator (BLUE).
Part 4: Connecting to the Real World (ML & Finance)
The Gauss-Markov theorem beautifully isolates the Unbiasedness and Variance components, which are the two central concerns in the ML Bias-Variance Trade-off.
- High Variance (Not Best): In ML, an OLS estimator with large variance is over-fitting the training data. This is why we sometimes abandon OLS (giving up the "Best" property) for techniques like Ridge Regression or Lasso Regression.
- Introducing Bias (Lasso/Ridge): Lasso and Ridge introduce a small amount of intentional bias (meaning ) to dramatically reduce the estimator's variance. In practice, this trade-off often leads to better predictive performance on unseen data, showing that sometimes being BLUE is less important than being a robust estimator with low variance.
In finance, the variance of an estimator is directly linked to estimation risk.
- Estimation Risk: In quantitative trading, when we use OLS to estimate factor exposures (betas) for hedging, we need the most stable, efficient estimate possible. The Gauss-Markov theorem assures the quant that the OLS formula provides the most precise (lowest variance) linear hedge ratios available, *provided the market is well-behaved* (i.e., satisfies the OLS assumptions).
- When Assumptions Fail: The moment a market becomes highly volatile (violating the Homoskedasticity assumption, ), the Gauss-Markov theorem is broken. The OLS estimator is no longer BLUE, forcing quants to switch to more advanced methods like the Generalised Least Squares (GLS) or robust standard error methods to recover the "Best" property.
What's Next? (Dealing with Assumptions)
We have now mastered the OLS formula, proved its properties, and seen the critical role of the five Gauss-Markov Assumptions.
In the next lesson, we move from theoretical proof to practical application: we will begin to explore the consequences and detection of the most common OLS assumption failures, starting with testing the significance of individual coefficients.