The Algebraic Solution: The Normal Equations

Putting the geometry of projections to work with a concrete, step-by-step recipe.

In our last lesson, we achieved a major theoretical breakthrough. We used the geometry of projections to derive a formula that solves the unsolvable. We discovered that the "best" approximate solution x^\hat{x} to an inconsistent system Ax=bAx=b can be found by solving a new, consistent system called the Normal Equations:

ATAx^=ATbA^TA\hat{x} = A^Tb

Today, we take this beautiful equation out of the world of theory and put it to work. We will use it as a step-by-step recipe to solve a real-world linear regression problem. This is the fundamental algorithm for fitting lines and curves to data.

The Problem: Fitting a Line to Data
Let's return to our simple scientist's experiment. We have three data points (x,y)(x, y) that don't lie perfectly on a line, and we want to find the "line of best fit," y=mx+cy = mx + c.
x (Temperature)y (Pressure)
11
23
32

Our goal is to find the optimal values for the slope mm and the y-intercept cc. These are our unknowns.

Step 1: Set up the (Unsolvable) System `Ax = b`

First, we must translate our problem into the language of linear algebra. Our unknown vector is what we're solving for: x=[m,c]Tx = [m, c]^T.

We write one equation for each data point:

  • m(1)+c(1)=1m(1) + c(1) = 1
  • m(2)+c(1)=3m(2) + c(1) = 3
  • m(3)+c(1)=2m(3) + c(1) = 2

From this, we construct our matrix AA and vector bb:

A=[112131],x=[mc],b=[132]A = \begin{bmatrix} 1 & 1 \\ 2 & 1 \\ 3 & 1 \end{bmatrix}, \quad x = \begin{bmatrix} m \\ c \end{bmatrix}, \quad b = \begin{bmatrix} 1 \\ 3 \\ 2 \end{bmatrix}

Our system is Ax=bAx = b. As we know, there is no exact solution. bb is not in the column space of AA.

Step 2: Assemble the Pieces of the Normal Equations

First, compute ATAA^TA:

ATA=[123111][112131]=[14663]A^TA = \begin{bmatrix} 1 & 2 & 3 \\ 1 & 1 & 1 \end{bmatrix} \begin{bmatrix} 1 & 1 \\ 2 & 1 \\ 3 & 1 \end{bmatrix} = \begin{bmatrix} 14 & 6 \\ 6 & 3 \end{bmatrix}

Notice that ATAA^TA is symmetric, which is always true.

Next, compute ATbA^Tb:

ATb=[123111][132]=[1(1)+2(3)+3(2)1(1)+1(3)+1(2)]=[136]A^Tb = \begin{bmatrix} 1 & 2 & 3 \\ 1 & 1 & 1 \end{bmatrix} \begin{bmatrix} 1 \\ 3 \\ 2 \end{bmatrix} = \begin{bmatrix} 1(1) + 2(3) + 3(2) \\ 1(1) + 1(3) + 1(2) \end{bmatrix} = \begin{bmatrix} 13 \\ 6 \end{bmatrix}
Step 3: Solve the Normal Equations for `x̂`
We now have a clean, solvable system: (ATA)x^=ATb(A^TA)\hat{x} = A^Tb.
[14663][mc]=[136]\begin{bmatrix} 14 & 6 \\ 6 & 3 \end{bmatrix} \begin{bmatrix} m \\ c \end{bmatrix} = \begin{bmatrix} 13 \\ 6 \end{bmatrix}

This is a standard 2x2 system. Let's use elimination. The augmented matrix is:

[14613636]\left[ \begin{array}{cc|c} 14 & 6 & 13 \\ 6 & 3 & 6 \end{array} \right]

Simplify Row 2 by dividing by 3:

[14613212]\left[ \begin{array}{cc|c} 14 & 6 & 13 \\ 2 & 1 & 2 \end{array} \right]

From Row 2, we have 2m+c=22m + c = 2, so c=22mc = 2 - 2m. Substitute into Row 1:

14m+6(22m)=13    14m+1212m=13    2m=1    m=0.514m + 6(2 - 2m) = 13 \implies 14m + 12 - 12m = 13 \implies 2m = 1 \implies m = 0.5

Now find cc: c=22(0.5)=1c = 2 - 2(0.5) = 1

The solution is x^=[mc]=[0.51]\hat{x} = \begin{bmatrix} m \\ c \end{bmatrix} = \begin{bmatrix} 0.5 \\ 1 \end{bmatrix}.

Step 4: Interpret the Result

The values m=0.5m=0.5 and c=1c=1 define the line of best fit for our data. The equation is:

y=0.5x+1y = 0.5x + 1

This is the line that minimizes the sum of the squared vertical distances from our data points to the line.

**Up Next:** The Normal Equations are a brilliant theoretical tool, but in the world of high-precision numerical computation, they have a hidden dark side. We'll explore "The Problem with the Normal Equations".