Building the mathematical machinery to find the closest possible solution.
In our last lesson, we faced a hard truth: most real-world systems `Ax=b` have no solution. We redefined our goal: instead of trying to hit the unreachable target `b`, we will aim for the closest point to it that we *can* hit.
This closest point, we said, is the **orthogonal projection** of `b` onto the Column Space of `A`. Today, we build the geometric machinery to find that projection. We will start with the simplest case imaginable and build up to the general, powerful formula.
Part 1: Projection onto a Line
The simplest case is projecting one vector onto another. This forms the basis for everything else.
Imagine a single vector `a` in space, which defines a line. Now, imagine another vector `b` that is not on this line. The closest point on the line to `b` is the **orthogonal projection of `b` onto `a`**, which we'll call `p`.
The key feature is that the error vector, e=b−p, must be **orthogonal** to the vector `a` that defines the line. This means their dot product is zero:
a⋅(b−p)=0
We also know `p` must be a scaled version of `a`, so p=x^a for some scalar x^. Substituting this in:
a⋅(b−x^a)=0⟹a⋅b−x^(a⋅a)=0
Solving for our unknown scalar x^ gives:
x^=a⋅aa⋅b=aTaaTb
And the projection vector `p` itself is:
p=x^a=(aTaaTb)a
Part 2: Projection onto a Subspace
Now we generalize this to project a vector `b` onto an entire subspace, like the Column Space of a matrix `A`.
Let the Column Space of `A` be spanned by linearly independent basis vectors `a₁, a₂, ..., aₙ`. The projection `p` is in this space, so it must be a linear combination of these basis vectors:
p=x^1a1+x^2a2+⋯+x^nan=Ax^
Here, `x̂` is the vector of coefficients we need to find. The error `e = b - p` must be orthogonal to the *entire* subspace, meaning it's orthogonal to every basis vector `aᵢ`.
⎩⎨⎧a1T(b−Ax^)=0a2T(b−Ax^)=0⋮anT(b−Ax^)=0
This entire system of equations can be written in a single, compact matrix form:
AT(b−Ax^)=0
Rearranging this gives us the magnificent **Normal Equations**:
ATAx^=ATb
We have converted our original, unsolvable system `Ax=b` into a new, smaller, **always solvable** square system that gives us the best approximate solution `x̂`.
**Up Next:** We will take the Normal Equations we just derived and use them as our primary tool to solve a real-world linear regression problem from start to finish.