Lesson 5.6: The Mathematics of PCA
We have the intuition for PCA: find the directions of maximum variance. Now we prove why this problem is mathematically identical to finding the eigenvectors of the covariance matrix. This lesson is a masterclass in applying the linear algebra of eigendecomposition to a real-world statistical problem.
Part 1: Framing the Problem - Maximizing Variance
Let be our data matrix, centered and scaled. A single observation (row) is . The sample covariance matrix is .
A Principal Component is a linear combination of the original features. We can define the first Principal Component, , as a vector of weights.
The "score" of the i-th observation on this component is the projection of that observation onto the weight vector:
The vector of all scores is . PCA's goal is to find the weights that **maximize the variance** of these scores.
The Optimization Problem for PC1
Maximize the variance of the scores, subject to the constraint that the weight vector has a length of 1 (to ensure a unique solution).
Part 2: The Eigendecomposition Solution
Let's simplify the objective function. The variance of a linear combination is given by (from Module 2).
So our problem is now:
The Lagrangian and the Eigenvector Connection
This is a constrained optimization problem, which we solve using a Lagrange multiplier, .
To find the maximum, we take the derivative with respect to and set it to zero:
This simplifies to:
This is the fundamental eigenvector equation! .
This stunning result proves that the vector that maximizes the variance is the **eigenvector** of the covariance matrix .
Which eigenvector? Let's pre-multiply by :
Since , this becomes . To maximize the variance, we must choose the eigenvector corresponding to the **largest eigenvalue**. This is PC1.
The Principal Components are the Eigenvectors
- Principal Component 1 (): The eigenvector of corresponding to the largest eigenvalue, .
- Principal Component 2 (): The eigenvector corresponding to the second-largest eigenvalue, , and so on.
Part 3: Explained Variance - The Role of Eigenvalues
We've shown that the variance of the scores on a principal component is equal to the component's eigenvalue:
This gives us a natural way to measure the "importance" of each component. The total variance in the dataset is the sum of the variances of all its features, which is also equal to the sum of all the eigenvalues (the trace of ):
Proportion of Variance Explained
The proportion of total variance explained by the j-th principal component is:
To find the cumulative variance explained by the first components, we just sum their individual proportions. This is what we look at to decide how many components to keep.
What's Next? Putting PCA to Work
We have now forged the complete theoretical link between the statistical goal of maximizing variance and the linear algebra tool of eigendecomposition.
It's time to see how this powerful technique is actually used in quantitative finance. In the next lesson, we will explore practical applications, such as using PCA to build custom market indices, create statistical risk factors, and denoise correlation matrices for more robust portfolio optimization.