Lesson 5.5: Advanced Concepts (Meta-Labeling, Feature Importance)

Exploring modern techniques from quantitative finance for building more robust and profitable models.

Meta-Labeling for Bet Sizing

Proposed by Dr. Marcos López de Prado, meta-labeling is a powerful technique for separating a model that decides the **direction** of a bet from a model that decides the **size** of the bet.

The Meta-Labeling Workflow

Build a Primary Model: Create a baseline model (e.g., a simple trend-following strategy) that generates binary trading signals (buy or sell). This model determines the *direction*.
Create Meta-Labels: The "meta-label" is not about the direction, but about whether the primary model's signal was *correct* and *profitable*. A meta-label of 1 means the primary signal led to a good outcome; 0 means it did not.
Build a Secondary Model: Train a new machine learning model (e.g., XGBoost) to predict these meta-labels. The features for this model can be more complex (e.g., volatility, market regime indicators).
Final Decision: You only take a trade when the primary model gives a signal AND the secondary (meta) model gives a high probability of that signal being a good one. The probability from the meta-model can be used to size the bet.

Feature Importance for Strategy Discovery

After training a complex model like XGBoost or a Random Forest, it's crucial to understand *why* it's making its decisions. Feature importance plots (which we saw in Module 4) are essential for this.

In quantitative finance, feature importance is not just for interpretation; it's a tool for discovery. If a feature you engineered (e.g., "the 5-day rolling volatility of social media sentiment") consistently shows up as a top predictor, you may have discovered a new, exploitable alpha signal.

Module 5 Complete!

Congratulations! You have completed your tour of applying machine learning to time series analysis. You understand how to engineer features, validate your models correctly, and apply both classic ensembles and deep learning models to forecasting problems.

Deep Learning for Time Series (RNNs & LSTMs)