QuantPrep | AI-Powered Learning for Quantitative Finance

In our last lesson, we successfully used the Normal Equations, $A^TA\hat{x} = A^Tb$ , as a powerful recipe to find the line of best fit. It felt clean, simple, and definitive. It is the textbook method and the foundation of our theoretical understanding of least squares.

However, when we move from small, neat textbook examples to the large, messy datasets of the real world, a dark side of the Normal Equations can emerge. Direct computation of $A^TA$ can sometimes lead to serious numerical problems.

Today, we will play the role of a numerical analyst and investigate the primary weakness of this method: the problem of ill-conditioning.

What is 'Conditioning'?

In numerical linear algebra, the condition number of a matrix is a measure of how sensitive the output of a problem is to small changes in the input data.

A matrix with a low condition number is well-conditioned. Small errors or noise in your input data `b` will only lead to small errors in your solution `x̂`. This is a stable, reliable situation.

A matrix with a very high condition number is ill-conditioned. Even tiny, unavoidable rounding errors in your input data can be magnified into massive, catastrophic errors in your final solution. This is a dangerous, unstable situation.

The Vicious Multiplier: `cond(AᵀA) = (cond(A))²`

The matrix we actually have to solve our system with is not `A`, but the new, square matrix `K = AᵀA`. It is a fundamental property of linear algebra that the condition number of `K` is related to the condition number of `A` in a very dramatic way.

\text{condition\_number}(A^TA) = (\text{condition\_number}(A))^2

The process of forming

A^TA

squares the condition number. Why is this so dangerous?

If `A` is slightly ill-conditioned, say `cond(A) = 100`, then `cond(AᵀA) = 100² = 10,000`.

If `A` is significantly ill-conditioned, say `cond(A) = 10⁷`, then `cond(AᵀA) = (10⁷)² = 10¹⁴`.

Numerical Precision at Risk

A condition number of `10¹⁴` is extremely high. Computers typically work with about 16 digits of precision. A condition number this high means that you could lose up to 14 of those 16 digits of accuracy in your final answer `x̂`!

The Need for a Better Way

The Normal Equations are a perfect theoretical tool for understanding the geometry of least squares. However, for serious numerical work, directly computing

A^TA

is often avoided.

We need a more sophisticated algorithm that can solve the least squares problem without ever explicitly forming the `AᵀA` matrix. This need for a stable, high-precision method is the entire motivation for our next topics: the Gram-Schmidt process and the beautiful QR Decomposition.

Lesson 5.4: The Problem with the Normal Equations

What is 'Conditioning'?

The Vicious Multiplier: `cond(AᵀA) = (cond(A))²`

Numerical Precision at Risk

The Need for a Better Way