Machine Learning Strategy Example (US Market)
This example demonstrates how to use machine learning for stock selection. It uses an XGBoost model (integrated via qlib) to predict future stock returns, then combines model predictions with a technical trend filter to construct a portfolio.
Feature Engineering
The strategy uses Price-to-Book ratio (PB) and Price-to-Earnings ratio (PE) as features. These value metrics capture how the market prices a company relative to its fundamentals. The feature.combine() function assembles the feature matrix with weekly resampling.
from finlab import data
from finlab.ml import feature as mlf
data.set_market('us')
features = mlf.combine({
'pb': data.get('us_key_metrics:pbRatio'),
'pe': data.get('us_key_metrics:peRatio')
}, resample='W')
features.head()
The resulting features DataFrame has a MultiIndex with (datetime, symbol) as the index and feature names as columns. Each row represents one stock at one point in time.
Label Definition
The label is the forward 12-week return -- the percentage price change over the next 12 weeks from each observation point. This is what the model will learn to predict.
from finlab.ml import label as mll
label = mll.return_percentage(features.index, resample='W', period=12)
Important: The label uses future prices that would not be known at the time of prediction. This is intentional -- the model learns to map current features to future outcomes. During backtesting, predictions are only made on out-of-sample data to avoid look-ahead bias.
Model Training and Prediction
The data is split at 2020-01-01: everything before is used for training, and everything after is the test set. The qlib XGBModel wrapper handles the XGBoost training and prediction pipeline.
import finlab.ml.qlib as q
is_train = features.index.get_level_values('datetime') < '2020-01-01'
X_train, y_train = features[is_train], label[is_train]
X_test = features[~is_train]
model = q.XGBModel()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
The y_pred output is a Series with the same MultiIndex as X_test, containing the model's predicted future return for each stock-date pair.
Portfolio Construction and Backtesting
The model's predictions are combined with a 60-day moving average trend filter: only stocks whose price is above the 60-day MA are eligible for selection. Among eligible stocks, the top 10 by predicted return are held in the portfolio, rebalanced quarterly.
from finlab.backtest import sim
from finlab.market import USMarket
close = data.get('price:adj_close')
close = close[close.index.dayofweek < 5] # remove weekends
position = y_pred[close > close.average(60)].is_largest(10)
report = sim(position, resample='Q', market=USMarket(), fee_ratio=0.001, tax_ratio=0)
report.display()
Key Concepts
Feature Selection
PB and PE are deliberately simple features for this introductory example. In practice, you can expand the feature set with:
- Growth metrics: Revenue growth, earnings growth (
us_income_statement:revenue,us_income_statement:netIncome) - Profitability: ROE, operating margin (
us_key_metrics:roe,us_income_statement:operatingIncome) - Technical indicators: RSI, MACD, moving average ratios
- Momentum: 6-month and 12-month price returns
More features can improve predictive power but also increase the risk of overfitting. Cross-validation and feature importance analysis are essential when scaling up.
Train/Test Split
The temporal split (< '2020-01-01') is critical. Unlike traditional ML where random splitting is acceptable, financial time series must be split chronologically to prevent data leakage. The model never sees future data during training.
Trend Filter
The close > close.average(60) filter serves two purposes:
- Reduces exposure to downtrending stocks -- even if the model predicts high returns, a stock in a downtrend may continue falling before the prediction materializes.
- Improves risk-adjusted returns -- combining fundamental/ML signals with technical trend confirmation is a well-known technique in quantitative finance.
Rebalancing Frequency
resample='Q' (quarterly) aligns with the prediction horizon (12 weeks ~ 1 quarter). The portfolio is reconstructed each quarter based on fresh predictions, allowing the model to adapt to changing market conditions while limiting turnover.
Parameters
resample='W'(features): Weekly feature sampling balances granularity with noise reduction.period=12(label): 12-week forward return as the prediction target.resample='Q'(backtest): Quarterly portfolio rebalancing.is_largest(10): Concentrated portfolio of the top 10 predictions.fee_ratio=0.001: Commission rate of 0.1%.tax_ratio=0: No transaction tax on US stock trades.