Lesson 6.7: The Perils of Backtesting

A backtest is an experiment, and like any experiment, it can be flawed. This lesson covers the most common and dangerous mistakes in backtesting, including look-ahead bias, survivorship bias, and data snooping. Understanding these pitfalls is the difference between a strategy that looks good on paper and one that works in the real world.

Part 1: The Illusion of Past Performance

A backtest simulates how a trading strategy would have performed on historical data. A good backtest can provide valuable insights, but a flawed one can create a dangerous illusion of profitability. Many hedge funds have failed by launching strategies based on impressive but ultimately misleading backtests.

Part 2: The Three Deadly Sins of Backtesting

Sin #1: Look-Ahead Bias

This is the most common technical error. It occurs when your simulation uses information that would not have been available at the time of the trade.

Examples:

Using the closing price to make a trading decision at the market open.
Calculating a rolling average for day `t` using data from `t+1`.
Using financial statement data (e.g., earnings) on the date it was reported, rather than the date it was publicly released and disseminated.

Sin #2: Survivorship Bias

This occurs when your historical dataset only includes the "survivors."

Example:

You build a strategy by backtesting on the *current* components of the S&P 500 index. This is a flawed experiment because your dataset implicitly excludes all the companies that were in the S&P 500 in the past but have since gone bankrupt or been acquired. Your results will be artificially inflated because you only tested on the "winners."

Sin #3: Data Snooping (Overfitting)

This is the human equivalent of overfitting a machine learning model. It occurs when a researcher tries hundreds of different strategies on the same dataset until, by pure chance, they find one that looks good. That strategy is not capturing a real market anomaly; it is capturing the random noise specific to that historical period.

Part 3: The Solution - Rigorous Validation

To combat these biases, quants use more sophisticated validation techniques.

Walk-Forward Validation: This is a more robust version of our simple train-test split. You train your model on a period of data (e.g., 2010-2015), test it on the next period (2016), then slide the window forward: train on 2010-2016, test on 2017, and so on. This better simulates how a strategy would be retrained and deployed in real life.
Out-of-Sample Testing: A truly robust strategy should work on different asset classes, different markets, and in different time periods. Testing a strategy that worked on US stocks on international stocks is a good way to see if it captured a genuine economic phenomenon or just a local data quirk.

What's Next? Advanced Concepts

We've now completed our whirlwind tour of classical time series analysis and the practicalities of applying ML models to sequential data.

The final lesson in this module will touch upon a more advanced concept that bridges the gap between differencing and levels data: **Fractional Differentiation**. This technique allows us to achieve stationarity while preserving as much of the data's original "memory" as possible, a key idea in modern quantitative finance.

Using ML for Time Series: How to Frame a Forecasting Problem for XGBoost

Advanced Concept: Fractional Differentiation for Preserving Memory