Linear Regression

The workhorse of quantitative analysis: modeling the relationship between variables.

Finding the "Best Fit" Line

Linear regression is a technique used to model the relationship between a dependent variable (like a stock's return) and one or more independent variables (like the overall market's return).

Y = β₀ + β₁X + ε

Y is the dependent variable (what you're trying to predict).
X is the independent variable (the predictor).
β₁ (Beta 1) is the slope: how much Y is expected to change for a one-unit change in X.
β₀ (Beta 0) is the intercept: the expected value of Y when X is 0.
ε (epsilon) is the error term (residual): the random noise or unexplained part.

How Does It Work? Ordinary Least Squares (OLS)

The "best fit" line isn't just an eyeball estimate. It's found using a method called Ordinary Least Squares (OLS). The goal of OLS is to find the specific values for the slope (β₁) and intercept (β₀) that minimize the sum of the squared residuals.

A residual is the vertical distance between an actual data point and the regression line—it's the error for that specific point. We square these errors so that positive and negative errors don't cancel each other out, and to give more weight to larger errors. OLS finds the one unique line that makes this total squared error as small as possible.

Application: The Capital Asset Pricing Model (CAPM)

A foundational model in finance that is a direct application of linear regression.

CAPM models a stock's excess return as a function of the overall market's excess return. The slope of this regression line, known as "Beta" (β), measures the stock's systematic risk. A Beta $>$ 1 means the stock is more volatile than the market; a Beta $<$ 1 means it's less volatile. The intercept, "Alpha" (α), theoretically represents the excess return the stock earns that isn't explained by the market. A positive alpha is the holy grail for portfolio managers.

Key Metrics & Assumptions

R-Squared (or the coefficient of determination) tells you what percentage of the variation in the dependent variable (the stock's return) can be explained by the independent variable (the market's return). An R-Squared of 0.85 means that 85% of the stock's price movement can be explained by movements in the overall market. It's a measure of how well your model fits the data. Adjust the "Noise" slider in the chart above: as noise increases, the data points spread out from the line and the R-Squared value drops, indicating a weaker fit.

Linear Regression

What is R-Squared?

The Assumptions of Linear Regression

Linear Regression

What is R-Squared?

The Assumptions of Linear Regression