Lesson 5.2: The Geometry of "Best Fit": Projections

Building the mathematical machinery to find the closest possible solution.

In our last lesson, we faced a hard truth: most real-world systems `Ax=b` have no solution. We redefined our goal: instead of trying to hit the unreachable target `b`, we will aim for the closest point to it that we *can* hit.

This closest point, we said, is the **orthogonal projection** of `b` onto the Column Space of `A`. Today, we build the geometric machinery to find that projection. We will start with the simplest case imaginable and build up to the general, powerful formula.

Part 1: Projection onto a Line

The simplest case is projecting one vector onto another. This forms the basis for everything else.

Imagine a single vector `a` in space, which defines a line. Now, imagine another vector `b` that is not on this line. The closest point on the line to `b` is the **orthogonal projection of `b` onto `a`**, which we'll call `p`.

The key feature is that the error vector, $e = b - p$ , must be **orthogonal** to the vector `a` that defines the line. This means their dot product is zero:

a \cdot (b - p) = 0

We also know `p` must be a scaled version of `a`, so $p = \hat{x}a$ for some scalar $\hat{x}$ . Substituting this in:

a \cdot (b - \hat{x}a) = 0 \implies a \cdot b - \hat{x}(a \cdot a) = 0

Solving for our unknown scalar $\hat{x}$ gives:

\hat{x} = \frac{a \cdot b}{a \cdot a} = \frac{a^Tb}{a^Ta}

And the projection vector `p` itself is:

p = \hat{x}a = \left( \frac{a^Tb}{a^Ta} \right) a

Part 2: Projection onto a Subspace

Now we generalize this to project a vector `b` onto an entire subspace, like the Column Space of a matrix `A`.

Let the Column Space of `A` be spanned by linearly independent basis vectors `a₁, a₂, ..., aₙ`. The projection `p` is in this space, so it must be a linear combination of these basis vectors:

p = \hat{x}_1 a_1 + \hat{x}_2 a_2 + \dots + \hat{x}_n a_n = A\hat{x}

Here, `x̂` is the vector of coefficients we need to find. The error `e = b - p` must be orthogonal to the *entire* subspace, meaning it's orthogonal to every basis vector `aᵢ`.

\begin{cases} a_1^T(b - A\hat{x}) = 0 \\ a_2^T(b - A\hat{x}) = 0 \\ \vdots \\ a_n^T(b - A\hat{x}) = 0 \end{cases}

This entire system of equations can be written in a single, compact matrix form:

A^T(b - A\hat{x}) = 0

Rearranging this gives us the magnificent **Normal Equations**:

A^TA\hat{x} = A^Tb

We have converted our original, unsolvable system `Ax=b` into a new, smaller, **always solvable** square system that gives us the best approximate solution `x̂`.

Lesson 5.1: The Inexact Problem: Why Ax=b Often Has No Solution

Lesson 5.3: The Algebraic Solution: The Normal Equations