Lesson 4.0: The Quest for the 'Best' Line: Simple Linear Regression (SLR)

Welcome to Module 4. We now put our theory into practice with the most important model in all of quantitative analysis. This lesson introduces the fundamental problem: how do we find the single best straight line that describes a cloud of data points? This is the foundation of econometrics and predictive modeling.

Part 1: Separating Signal from Noise

Imagine you have data on students' study hours ( $X$ ) and their final exam scores ( $Y$ ). You plot them and see a rough, upward-trending cloud of points. Your brain immediately sees a pattern: more studying is associated with higher scores. This pattern is the **signal**.

But the relationship isn't perfect. Some students who studied a lot did poorly, and some who studied a little did well. This randomness is the **noise**. The goal of linear regression is to separate the signal from the noise.

We formalize this by assuming a **"true" underlying model**:

The True Model: Signal + Noise

Y_i = \underbrace{(\beta_0 + \beta_1 X_i)}_{\text{Signal: The True Line}} + \underbrace{\epsilon_i}_{\text{Noise: The Irreducible Error}}

$\beta_0$ & $\beta_1$ : The **true parameters** (intercept and slope) of the signal. They are fixed, unknown constants we want to estimate.
$\epsilon_i$ : The **error term**. This represents all the unobserved factors that affect $Y_i$ besides $X_i$ (luck, innate talent, etc.). It's a random variable we can never see.

Part 2: Anatomy of a Regression

Since we can't see the true line, we must use our data to create an **estimated line**, called the **fitted line**. We denote our estimates with "hats" ( $\hat{\ }$ ).

Imagine a scatter plot of data points. A straight line runs through it. For one point (Xi, Yi), a vertical line drops to the regression line (at Y-hat_i). The length of this vertical line is the residual (ei).

The Key Components

The Fitted Line: Our best guess for the signal. It's the line we actually calculate.
$\hat{Y}_i = \hat{\beta}_0 + \hat{\beta}_1 X_i$
$\hat{Y}_i$ is our **predicted value** of Y for a given X.
The Residual ( $e_i$ ): The observable error of our fitted line. It's the difference between the actual data point and our prediction for that point.
$e_i = Y_i - \hat{Y}_i$
The residual $e_i$ is our sample-based estimate of the unobservable true error $\epsilon_i$ .

Part 3: Defining the 'Best' Line

How do we find the "best" estimates $\hat{\beta}_0$ and $\hat{\beta}_1$ ? We need a criterion. The goal is to find the line that makes our observable errors (the residuals, $e_i$ ) as small as possible.

The 'Aha!' Moment: Minimizing Squared Errors

How do we measure the total size of all our errors?

Sum them ( $\sum e_i$ )? Bad idea. Large positive and negative errors would cancel out, making a terrible line look perfect.
Sum their absolute values ( $\sum |e_i|$ )? Better, but absolute values are difficult to work with in calculus.
The Genius Idea: Sum the **squares** of the errors ( $\sum e_i^2$ ).
- This makes all errors positive.
- It heavily penalizes large errors (an error of 10 has more impact than ten errors of 1).
- It results in a smooth, differentiable function that is easy to minimize with calculus.

The Ordinary Least Squares (OLS) Objective Function

The goal of OLS is to find the specific values of $\hat{\beta}_0$ and $\hat{\beta}_1$ that **minimize** the **Sum of Squared Residuals (SSR)**.

\min_{\hat{\beta}_0, \hat{\beta}_1} \text{SSR} = \min \sum_{i=1}^n e_i^2 = \min \sum_{i=1}^n (Y_i - \hat{Y}_i)^2

= \min \sum_{i=1}^n (Y_i - \hat{\beta}_0 - \hat{\beta}_1 X_i)^2

What's Next? The Derivation

We have defined our objective. We have a clear mountain to climb (or in this case, a valley to find the bottom of). We want to find the estimators $\hat{\beta}_0$ and $\hat{\beta}_1$ that minimize this SSR function.

This is a classic optimization problem that can be solved with basic calculus. In the next lesson, we will perform the full, "no-skip" mathematical derivation to find the famous formulas for the OLS estimators.

Lesson 3.12: The Generalized LRT and Wilks' Theorem

Lesson 4.1: The Full OLS Derivation (SLR)