Lesson 5.3: The Algebraic Solution: The Normal Equations

Putting the geometry of projections to work with a concrete, step-by-step recipe.

In our last lesson, we achieved a major theoretical breakthrough. We used the geometry of projections to derive a formula that solves the unsolvable. We discovered that the "best" approximate solution $\hat{x}$ to an inconsistent system $Ax=b$ can be found by solving a new, consistent system called the Normal Equations:

A^TA\hat{x} = A^Tb

Today, we take this beautiful equation out of the world of theory and put it to work. We will use it as a step-by-step recipe to solve a real-world linear regression problem. This is the fundamental algorithm for fitting lines and curves to data.

The Problem: Fitting a Line to Data

Let's return to our simple scientist's experiment. We have three data points

(x, y)

that don't lie perfectly on a line, and we want to find the "line of best fit,"

y = mx + c

x (Temperature)	y (Pressure)
1	1
2	3
3	2

Our goal is to find the optimal values for the slope $m$ and the y-intercept $c$ . These are our unknowns.

The Recipe

Step 1: Set up the (Unsolvable) System `Ax = b`

First, we must translate our problem into the language of linear algebra. Our unknown vector is what we're solving for: $x = [m, c]^T$ .

We write one equation for each data point:

$m(1) + c(1) = 1$
$m(2) + c(1) = 3$
$m(3) + c(1) = 2$

From this, we construct our matrix $A$ and vector $b$ :

A = \begin{bmatrix} 1 & 1 \\ 2 & 1 \\ 3 & 1 \end{bmatrix}, \quad x = \begin{bmatrix} m \\ c \end{bmatrix}, \quad b = \begin{bmatrix} 1 \\ 3 \\ 2 \end{bmatrix}

Our system is $Ax = b$ . As we know, there is no exact solution. $b$ is not in the column space of $A$ .

Step 2: Assemble the Pieces of the Normal Equations

First, compute $A^TA$ :

A^TA = \begin{bmatrix} 1 & 2 & 3 \\ 1 & 1 & 1 \end{bmatrix} \begin{bmatrix} 1 & 1 \\ 2 & 1 \\ 3 & 1 \end{bmatrix} = \begin{bmatrix} 14 & 6 \\ 6 & 3 \end{bmatrix}

Notice that $A^TA$ is symmetric, which is always true.

Next, compute $A^Tb$ :

A^Tb = \begin{bmatrix} 1 & 2 & 3 \\ 1 & 1 & 1 \end{bmatrix} \begin{bmatrix} 1 \\ 3 \\ 2 \end{bmatrix} = \begin{bmatrix} 1(1) + 2(3) + 3(2) \\ 1(1) + 1(3) + 1(2) \end{bmatrix} = \begin{bmatrix} 13 \\ 6 \end{bmatrix}

Step 3: Solve the Normal Equations for `x̂`

We now have a clean, solvable system: $(A^TA)\hat{x} = A^Tb$ .

\begin{bmatrix} 14 & 6 \\ 6 & 3 \end{bmatrix} \begin{bmatrix} m \\ c \end{bmatrix} = \begin{bmatrix} 13 \\ 6 \end{bmatrix}

This is a standard 2x2 system. We can solve it via elimination or substitution. The augmented matrix is:

\left[ \begin{array}{cc|c} 14 & 6 & 13 \\ 6 & 3 & 6 \end{array} \right]

Simplify Row 2 by dividing by 3: $\left[ \begin{array}{cc|c} 14 & 6 & 13 \\ 2 & 1 & 2 \end{array} \right]$

From Row 2, we have $2m + c = 2$ , so $c = 2 - 2m$ . Substitute into Row 1:

$14m + 6(2 - 2m) = 13 \implies 14m + 12 - 12m = 13 \implies 2m = 1 \implies m = 0.5$

Now find $c$ : $c = 2 - 2(0.5) = 1$

The solution is $\hat{x} = \begin{bmatrix} m \\ c \end{bmatrix} = \begin{bmatrix} 0.5 \\ 1 \end{bmatrix}$ .

Step 4: Interpret the Result

The values $m=0.5$ and $c=1$ define the line of best fit for our data. The equation is:

y = 0.5x + 1

This is the line that minimizes the sum of the squared vertical distances from our data points to the line.

Lesson 5.2: The Geometry of "Best Fit": Projections

Lesson 5.4: The Problem with the Normal Equations