Putting the geometry of projections to work with a concrete, step-by-step recipe.
In our last lesson, we achieved a major theoretical breakthrough. We used the geometry of projections to derive a formula that solves the unsolvable. We discovered that the "best" approximate solution x^ to an inconsistent system Ax=b can be found by solving a new, consistent system called the Normal Equations:
ATAx^=ATb
Today, we take this beautiful equation out of the world of theory and put it to work. We will use it as a step-by-step recipe to solve a real-world linear regression problem. This is the fundamental algorithm for fitting lines and curves to data.
The Problem: Fitting a Line to Data
Let's return to our simple scientist's experiment. We have three data points (x,y) that don't lie perfectly on a line, and we want to find the "line of best fit," y=mx+c.
x (Temperature)
y (Pressure)
1
1
2
3
3
2
Our goal is to find the optimal values for the slope m and the y-intercept c. These are our unknowns.
Step 1: Set up the (Unsolvable) System `Ax = b`
First, we must translate our problem into the language of linear algebra. Our unknown vector is what we're solving for: x=[m,c]T.
We write one equation for each data point:
m(1)+c(1)=1
m(2)+c(1)=3
m(3)+c(1)=2
From this, we construct our matrix A and vector b:
A=123111,x=[mc],b=132
Our system is Ax=b. As we know, there is no exact solution. b is not in the column space of A.
Step 2: Assemble the Pieces of the Normal Equations
First, compute ATA:
ATA=[112131]123111=[14663]
Notice that ATA is symmetric, which is always true.
We now have a clean, solvable system: (ATA)x^=ATb.
[14663][mc]=[136]
This is a standard 2x2 system. Let's use elimination. The augmented matrix is:
[14663136]
Simplify Row 2 by dividing by 3:
[14261132]
From Row 2, we have 2m+c=2, so c=2−2m. Substitute into Row 1:
14m+6(2−2m)=13⟹14m+12−12m=13⟹2m=1⟹m=0.5
Now find c: c=2−2(0.5)=1
The solution is x^=[mc]=[0.51].
Step 4: Interpret the Result
The values m=0.5 and c=1 define the line of best fit for our data. The equation is:
y=0.5x+1
This is the line that minimizes the sum of the squared vertical distances from our data points to the line.
**Up Next:** The Normal Equations are a brilliant theoretical tool, but in the world of high-precision numerical computation, they have a hidden dark side. We'll explore "The Problem with the Normal Equations".