Lesson 4.1: The Full OLS Derivation (SLR)
This is the foundational mathematical workout for the entire module. We start with our objective—to find the 'best' line—and rigorously derive the exact formulas for the OLS estimators using both calculus and matrix algebra, with no steps skipped.
Part 1: The Objective - Minimizing the Error
In the previous lesson, we established our goal. We have a cloud of data () and we want to find the line that is the "best fit."
We defined "best" in a very specific way: the line that minimizes the sum of the squared differences between our actual data () and our predicted values ().
The OLS Objective Function
Find the values of and that minimize the Sum of Squared Residuals (SSR):
This is a classic optimization problem. We will now solve it in two ways: the intuitive calculus way, and the powerful matrix algebra way.
Part 2: The Calculus Derivation ('No-Skip' Version)
The Calculus Toolkit
To find the minimum of a function with two variables (), we must take the partial derivative with respect to each variable, set both derivatives to zero, and solve the resulting system of two equations.
Step 1: Minimize with respect to β̂₀ (The Intercept)
We take the partial derivative of the SSR and set it to zero. Using the chain rule, the derivative of the inside term w.r.t is -1.
Divide by -2 and distribute the sum:
Applying summation rules ( and ):
Now, we solve for :
Dividing by and recognizing the definitions of sample means, and , gives our first key result:
Result 1: The OLS Intercept
Step 2: Minimize with respect to β̂₁ (The Slope)
We take the partial derivative w.r.t . The chain rule derivative of the inside term is now .
Divide by -2 and distribute the term:
This is our second "First-Order Condition."
Step 3: Solve the System of Equations
We now substitute our result for from Step 1 into our second equation from Step 2:
Distribute and then group the terms that contain :
Now, we can solve for :
This is correct, but ugly. Using standard algebraic identities, the numerator is the numerator of the sample covariance, and the denominator is the numerator of the sample variance. This gives the final, beautiful result:
Result 2: The OLS Slope Estimator
Part 3: The Matrix Derivation (The General 'Master' Formula)
That calculus was intense. The matrix algebra approach is more abstract but far more powerful, giving a single formula that works for one predictor or one thousand.
Deriving the Matrix ('Normal') Equations
Step 1: Write the SSR in matrix form.
Step 2: Expand the expression.
Step 3: Differentiate with respect to the vector and set to zero. (This uses matrix calculus rules).
Step 4: Solve for .
The OLS Estimator in Matrix Form
Solving the Normal Equations by pre-multiplying by the inverse gives the master formula:
Part 4: The Grand Unification
This is the moment of truth. We must prove that the abstract matrix formula gives the exact same answer as our intuitive calculus formula for in the simple, one-variable case. This is a tough but essential 'no-skip' proof.
Proof: The Matrix Formula Simplifies to the Calculus Formula
For SLR, we define our matrices:
1. Calculate :
2. Calculate :
3. Calculate the inverse :
The determinant is .
4. Assemble :
5. Solve for (the second row):
The numerator simplifies to .
Q.E.D. The formulas are identical. The matrix method is confirmed.
What's Next? Is Our Line Any Good?
We have done it. We've opened the black box and rigorously derived the exact formulas for the 'best' fitting line using two different methods.
But this only gives us the line itself. It doesn't tell us how well that line actually fits the data. Does it explain 80% of the variation in our Y variable, or only 2%?
In the next lesson, we will develop the tools to answer this, including the single most famous statistic in data analysis: **R-Squared (R²)**.