The Inexact Problem: Why Ax=b Often Has No Solution

Welcome to the messy, noisy, but far more realistic world of data.

Welcome to Module 5. We are now stepping out of the clean, theoretical world of perfect mathematical systems and into the messy, noisy, but far more realistic world of data.

In our journey so far, when we solved `Ax=b`, we implicitly assumed that a perfect solution existed. But in almost every practical application—from economics to machine learning, from engineering to biology—the single most common answer to the question "Does `Ax=b` have a solution?" is a resounding NO.

Today, we will understand *why* this is the case, what it means, and how it forces us to redefine our entire concept of a "solution."

The Reality of Data: Overdetermined Systems
Imagine you are a scientist trying to find a linear relationship, `y = mx + c`, between two variables. You run an experiment and collect some data points.
x (Temperature)y (Pressure)
11
23
32

Each data point gives us one equation:

  • For (1, 1): m(1)+c=1m(1) + c = 1
  • For (2, 3): m(2)+c=3m(2) + c = 3
  • For (3, 2): m(3)+c=2m(3) + c = 2

Let's write this in our familiar `Ax=b` form. Our unknowns are `m` and `c`, so our unknown vector is `x = [m, c]ᵀ`.

[112131]A[mc]x=[132]b\underbrace{\begin{bmatrix} 1 & 1 \\ 2 & 1 \\ 3 & 1 \end{bmatrix}}_{A} \underbrace{\begin{bmatrix} m \\ c \end{bmatrix}}_{x} = \underbrace{\begin{bmatrix} 1 \\ 3 \\ 2 \end{bmatrix}}_{b}

Notice the dimensions. `A` is a `3x2` matrix. We have **3 equations** but only **2 unknowns**. This is an **overdetermined system**. We have more constraints (data points) than we have degrees of freedom (parameters). Unless your data is perfectly, miraculously linear (which real data never is), it's impossible for all three lines to intersect at a single point.

The New Goal: Get as Close as Possible
If `Ax=b` has no solution, we must change our goal. We now seek an `x` that makes `Ax` as **close** to `b` as possible.

We define the **error vector** `e` as the difference between what we *want* (`b`) and what we *get* (`Ax`).

e=bAxe = b - Ax

The official goal of **Least Squares** is to find the special vector `x̂` ("x-hat") that minimizes the squared length of the error vector:

Find x^ that minimizes bAx2\text{Find } \hat{x} \text{ that minimizes } \|b - Ax\|^2

This `x̂` is our **best approximate solution**. For our problem, it will give us the line of best fit.

The Geometric Insight: Projection
The key to solving this lies in the Column Picture.

An exact solution exists only if `b` is in the **Column Space of A**. When there's no solution, it means `b` is **outside** of `C(A)`. In our example, `C(A)` is a plane in 3D space, and `b` is a point not on that plane.

The closest point to `b` within the Column Space is its **orthogonal projection** onto that subspace. Let's call this projection vector `p`.

The new, solvable quest is: Find `x̂` such that `Ax̂ = p`.

The error vector `e = b - p` will be **orthogonal** to the Column Space, a crucial fact we will use to find the solution.

Summary: A New Mindset
  1. **The Reality:** Most real-world systems are **overdetermined** and have no exact solution.
  2. **The New Goal:** Find the **least-squares solution `x̂`** that minimizes the error `||b - Ax||²`.
  3. **The Geometry:** This is equivalent to finding the **orthogonal projection `p`** of `b` onto the Column Space of `A` and solving `Ax̂ = p`.

**Up Next:** We will develop the formula for **projections**, learning how to find the "shadow" a vector casts onto a line and, more generally, onto any subspace.