Skip to content

Machine Learning Strategy Complete Workflow

This document provides the complete development workflow for machine learning quantitative strategies, from feature engineering, label generation, and model training to backtest validation and live deployment.

ML Strategy Development -- Recommended with AI Assistant

ML strategy development involves feature engineering, label design, model training, and more. After installing FinLab Skill, the AI coding assistant can provide code examples and debugging help at every stage.

Workflow Overview

graph TD
    A[Raw Data] --> B[Feature Engineering<br/>finlab.ml.feature]
    B --> C[Label Generation<br/>finlab.ml.label]
    C --> D[Dataset Splitting]
    D --> E[Model Training<br/>finlab.ml.qlib]
    E --> F[Predict Position Weights]
    F --> G[Backtest Validation]
    G --> H{Performance Acceptable?}
    H -->|No| I[Adjust Features / Model]
    I --> B
    H -->|Yes| J[Out-of-Sample Testing]
    J --> K{Validation Passed?}
    K -->|No| I
    K -->|Yes| L[Deploy to Live Trading]

Stage 1: Feature Engineering

The success of machine learning depends 80% on feature engineering. FinLab provides powerful feature engineering tools.

1.1 Load and Merge Fundamental Features

from finlab import data
from finlab.ml import feature as mlf

# Load fundamental data
pb_ratio = data.get('price_earning_ratio:股價淨值比')
pe_ratio = data.get('price_earning_ratio:本益比')
roe = data.get('fundamental_features:股東權益報酬率')
roa = data.get('fundamental_features:資產報酬率')

# Combine into feature set
fundamental_features = mlf.combine({
    'pb': pb_ratio,
    'pe': pe_ratio,
    'roe': roe,
    'roa': roa
}, resample='W')  # Resample to weekly frequency

print(fundamental_features.head())
# Output:
#                                        pb     pe    roe    roa
# (2010-01-04 00:00:00, '1101')        1.47  18.85   7.80   3.21
# (2010-01-04 00:00:00, '1102')        1.44  14.58   9.87   4.15

1.2 Add Technical Indicator Features

# Method 1: Use random technical indicators (explore best indicators)
ta_features = mlf.combine({
    'talib': mlf.ta(mlf.ta_names(n=5))  # Randomly generate 5 parameter configurations per indicator
}, resample='W')

# Method 2: Specify particular technical indicators
from finlab import data

close = data.get('price:收盤價')
volume = data.get('price:成交股數')

specific_ta = mlf.combine({
    'rsi': close.ta.RSI(timeperiod=14),
    'macd': close.ta.MACD(),
    'bbands': close.ta.BBANDS(timeperiod=20),
    'obv': volume.ta.OBV(close)
}, resample='W')

print(f"Technical indicator feature count: {ta_features.shape[1]}")  # e.g., 450 features

1.3 Add Custom Features

# Revenue-related features
rev = data.get('monthly_revenue:當月營收')
rev_yoy = data.get('monthly_revenue:去年同月增減(%)')

custom_features = mlf.combine({
    'rev_ma3': rev.average(3),          # Trailing 3-month revenue moving average
    'rev_ma12': rev.average(12),        # Trailing 12-month revenue moving average
    'rev_momentum': rev.average(3) / rev.average(12),  # Revenue momentum
    'rev_yoy': rev_yoy,                 # Revenue YoY growth
}, resample='W')

print(custom_features.head())

1.4 Merge All Features

# Merge all feature sets
all_features = mlf.combine({
    'fundamental': fundamental_features,
    'technical': ta_features,
    'custom': custom_features
}, resample='W')

print(f"Total feature count: {all_features.shape[1]}")  # e.g., 470 features
print(f"Row count: {all_features.shape[0]}")  # e.g., 150,000 rows

# Check missing values
missing_ratio = all_features.isna().sum() / len(all_features)
print(f"Missing value ratio:\n{missing_ratio[missing_ratio > 0.5]}")  # Show features with > 50% missing

# Remove features with too many missing values
all_features = all_features.loc[:, missing_ratio < 0.5]
print(f"Filtered feature count: {all_features.shape[1]}")

Stage 2: Label Generation

Labels define our prediction target. finlab.ml.label provides various label generation functions, all accepting features.index (MultiIndex) as the first argument.

2.1 Predict Future Returns

from finlab.ml import label as mll

# Predict 1-week future return (most common)
label = mll.return_percentage(all_features.index, resample='W', period=1)

print(label.head())
# Output:
# datetime             instrument
# 2010-01-04 00:00:00  1101          0.032
#                      1102         -0.015
#                      1103          0.021
# dtype: float64

# Check label distribution
print(label.describe())
# Output:
# count    150000.00
# mean          0.005
# std           0.087
# min          -0.450
# 25%          -0.042
# 50%           0.002
# 75%           0.051
# max           0.520

2.2 Excess Return Labels

# Excess return over market median for the same period
label_excess_median = mll.excess_over_median(all_features.index, resample='W', period=1)

# Excess return over market mean for the same period
label_excess_mean = mll.excess_over_mean(all_features.index, resample='W', period=1)

print(label_excess_median.describe())

2.3 Other Label Types

# Day trading return (open-to-close change)
label_daytrading = mll.daytrading_percentage(all_features.index)

# Risk metric: Maximum Adverse Excursion (max decline during holding period)
label_mae = mll.maximum_adverse_excursion(all_features.index, period=5)

# Risk metric: Maximum Favorable Excursion (max gain during holding period)
label_mfe = mll.maximum_favorable_excursion(all_features.index, period=5)

# Multi-period prediction (predict different time horizons)
label_1w = mll.return_percentage(all_features.index, resample='W', period=1)
label_2w = mll.return_percentage(all_features.index, resample='W', period=2)
label_4w = mll.return_percentage(all_features.index, resample='W', period=4)

Label Selection Recommendations

  • return_percentage: Most commonly used, directly predicts returns
  • excess_over_median: Predicts relative performance, reduces market-wide movement impact
  • daytrading_percentage: Suitable for day trading strategies
  • maximum_adverse_excursion: Suitable for risk management models
  • period should match the strategy rebalancing frequency: e.g., resample='W', period=1 predicts 1-week return

Stage 3: Dataset Preparation & Splitting

3.1 Align Features and Labels

Features (all_features) and labels (label) share the same MultiIndex (datetime, instrument) and can be split directly.

# Select label
label = mll.return_percentage(all_features.index, resample='W', period=1)

# Check alignment
print(f"Feature row count: {len(all_features)}")
print(f"Label row count: {len(label)}")
print(f"Label NaN ratio: {label.isna().mean():.2%}")

3.2 Split Training and Test Sets

# Use time-based splitting (strictly avoid data leakage)
is_train = all_features.index.get_level_values('datetime') < '2023-01-01'

X_train = all_features[is_train]
y_train = label[is_train]
X_test = all_features[~is_train]

print(f"Training set: {len(X_train)} rows ({X_train.index.get_level_values(0).min()} ~ {X_train.index.get_level_values(0).max()})")
print(f"Test set: {len(X_test)} rows ({X_test.index.get_level_values(0).min()} ~ {X_test.index.get_level_values(0).max()})")
print(f"Training feature shape: {X_train.shape}")

Time-Based Splitting is Critical

You must use time-based splitting, not random splitting. Random splitting causes data leakage -- the model sees future data to predict the past, leading to artificially inflated backtest performance.


Stage 4: Model Training

4.1 Using the LightGBM Model

import finlab.ml.qlib as q

# Create and train LightGBM model
model = q.LGBModel()
model.fit(X_train, y_train)

print("Training complete!")

4.2 Using Other Models

# XGBoost
model_xgb = q.XGBModel()
model_xgb.fit(X_train, y_train)

# CatBoost
model_cat = q.CatBoostModel()
model_cat.fit(X_train, y_train)

# Linear model (fast validation, less prone to overfitting)
model_linear = q.LinearModel()
model_linear.fit(X_train, y_train)

# Deep learning
model_dnn = q.DNNModel()
model_dnn.fit(X_train, y_train)

# List all available models
models = q.get_models()
print(list(models.keys()))

4.3 Multi-Model Comparison

import finlab.ml.qlib as q
from finlab.backtest import sim

# Quick comparison of multiple models
results = {}
for name, ModelClass in [('LightGBM', q.LGBModel), ('XGBoost', q.XGBModel), ('Linear', q.LinearModel)]:
    model = ModelClass()
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    position = y_pred.is_largest(30)
    report = sim(position, resample='W', name=f"ML {name}", upload=False)
    results[name] = report

# Compare performance
for name, report in results.items():
    stats = report.get_stats()
    print(f"{name}: annualized return {stats['daily_mean']:.2%}, Sharpe ratio {stats['daily_sharpe']:.2f}")

Stage 5: Prediction & Position Weight Generation

model.predict() returns a FinlabDataFrame (index = dates, columns = stock symbols), which can directly use FinlabDataFrame methods to convert into positions.

5.1 Generate Predictions

# Predict on test set
y_pred = model.predict(X_test)

print(y_pred.head())
# Output (FinlabDataFrame, index=dates, columns=stock symbols):
#              1101    1102    1103    1216    2330
# 2023-01-06  0.032   0.015  -0.008   0.045   0.023
# 2023-01-13  0.018   0.027   0.003   0.012   0.041

# Check prediction distribution
print(y_pred.stack().describe())

5.2 Convert to Positions

from finlab.backtest import sim

# Method 1: Top N stock selection (buy the N stocks with highest predictions)
position_topn = y_pred.is_largest(30)

# Method 2: Top 20% selection
position_quantile = y_pred > y_pred.quantile(0.8)

# Method 3: Allocate weights based on prediction values
position_weighted = y_pred / y_pred.sum()

print(f"Top 30 strategy average holdings: {position_topn.sum(axis=1).mean():.1f}")
print(f"Top 20% strategy average holdings: {position_quantile.sum(axis=1).mean():.1f}")

Stage 6: Backtest Validation

6.1 Run Backtest

from finlab.backtest import sim

# Backtest Top 30 strategy
report_topn = sim(
    position_topn,
    resample='W',
    name="ML Top 30 Strategy",
    upload=False
)

# Backtest weighted strategy
report_weighted = sim(
    position_weighted,
    resample='W',
    name="ML Weighted Strategy",
    upload=False
)

# Display performance
report_topn.display()

6.2 Performance Comparison

import pandas as pd

stats_topn = report_topn.get_stats()
stats_weighted = report_weighted.get_stats()

comparison = pd.DataFrame({
    'Top 30 Strategy': [
        stats_topn['daily_mean'],
        stats_topn['daily_sharpe'],
        stats_topn['max_drawdown'],
        stats_topn['win_ratio']
    ],
    'Weighted Strategy': [
        stats_weighted['daily_mean'],
        stats_weighted['daily_sharpe'],
        stats_weighted['max_drawdown'],
        stats_weighted['win_ratio']
    ]
}, index=['Annualized Return', 'Sharpe Ratio', 'Max Drawdown', 'Win Rate'])

print(comparison)

6.3 In-Depth Analysis

# Liquidity analysis
report_topn.run_analysis('LiquidityAnalysis', required_volume=100000)

# MAE/MFE analysis
report_topn.display_mae_mfe_analysis()

# Period stability
report_topn.run_analysis('PeriodStatsAnalysis')

# Alpha/Beta
report_topn.run_analysis('AlphaBetaAnalysis')

Stage 7: Feature Engineering Iteration & Optimization

7.1 Reduce Feature Count

# Strategy 1: Use fewer technical indicators
features_small = mlf.combine({
    'pb': pb_ratio,
    'pe': pe_ratio,
    'roe': roe,
    'talib': mlf.ta(mlf.ta_names(n=1)[:20])  # Only take the first 20 indicators
}, resample='W')

label_small = mll.return_percentage(features_small.index, resample='W', period=1)

is_train_small = features_small.index.get_level_values('datetime') < '2023-01-01'
model_v2 = q.LGBModel()
model_v2.fit(features_small[is_train_small], label_small[is_train_small])

y_pred_v2 = model_v2.predict(features_small[~is_train_small])
position_v2 = y_pred_v2.is_largest(30)
report_v2 = sim(position_v2, resample='W', name="ML V2 Reduced Features", upload=False)
report_v2.display()

7.2 Adjust Label Prediction Period

# Test different prediction periods
for period in [1, 2, 4]:
    label_n = mll.return_percentage(all_features.index, resample='W', period=period)

    model_n = q.LGBModel()
    model_n.fit(X_train, label_n[is_train])

    y_pred_n = model_n.predict(X_test)
    position_n = y_pred_n.is_largest(30)
    report_n = sim(position_n, resample='W', name=f"ML Predict {period}W", upload=False)

    stats = report_n.get_stats()
    print(f"Predict {period}W: annualized return {stats['daily_mean']:.2%}, Sharpe ratio {stats['daily_sharpe']:.2f}")

7.3 Try Different Label Types

# Compare return vs excess return labels
label_return = mll.return_percentage(all_features.index, resample='W', period=1)
label_excess = mll.excess_over_median(all_features.index, resample='W', period=1)

for label_name, label_data in [('Return', label_return), ('Excess Return', label_excess)]:
    model_cmp = q.LGBModel()
    model_cmp.fit(X_train, label_data[is_train])

    y_pred_cmp = model_cmp.predict(X_test)
    position_cmp = y_pred_cmp.is_largest(30)
    report_cmp = sim(position_cmp, resample='W', name=f"ML {label_name}", upload=False)

    stats = report_cmp.get_stats()
    print(f"{label_name}: annualized return {stats['daily_mean']:.2%}, Sharpe ratio {stats['daily_sharpe']:.2f}")

Stage 8: Live Deployment

8.1 Build a Real-Time Prediction Pipeline

from finlab import data
from finlab.ml import feature as mlf, label as mll
import finlab.ml.qlib as q
import pickle

# 1. Train the full model (using all historical data)
features = mlf.combine({
    'pb': data.get('price_earning_ratio:股價淨值比'),
    'pe': data.get('price_earning_ratio:本益比'),
    'roe': data.get('fundamental_features:股東權益報酬率'),
    'talib': mlf.ta(mlf.ta_names(n=1)[:20])
}, resample='W')

label = mll.return_percentage(features.index, resample='W', period=1)

model = q.LGBModel()
model.fit(features, label)

# 2. Save the model
with open('ml_model.pkl', 'wb') as f:
    pickle.dump(model, f)

# 3. Load model and predict latest positions
with open('ml_model.pkl', 'rb') as f:
    loaded_model = pickle.load(f)

y_pred = loaded_model.predict(features)

# 4. Get latest positions
position = y_pred.is_largest(30)
latest_position = position.iloc[-1]
latest_position = latest_position[latest_position > 0].sort_values(ascending=False)

print("Latest position recommendations:")
print(latest_position)

8.2 Automated Trading Setup

from finlab.backtest import sim
from finlab.online.sinopac_account import SinopacAccount
from finlab.online.order_executor import OrderExecutor

# Create a script to run weekly (every Monday)
def weekly_rebalance():
    # Recalculate features
    features = mlf.combine({
        'pb': data.get('price_earning_ratio:股價淨值比'),
        'pe': data.get('price_earning_ratio:本益比'),
        'roe': data.get('fundamental_features:股東權益報酬率'),
        'talib': mlf.ta(mlf.ta_names(n=1)[:20])
    }, resample='W')

    # Load model and predict
    with open('ml_model.pkl', 'rb') as f:
        model = pickle.load(f)

    y_pred = model.predict(features)
    position = y_pred.is_largest(30)

    # Use sim to generate report
    report = sim(position, resample='W', upload=False)

    # Execute orders
    account = SinopacAccount(simulation=False)
    executor = OrderExecutor(report=report, account=account, fund=1000000)
    executor.execute()

# Use cron or a scheduling tool to run weekly_rebalance() periodically

Complete Code Summary

# =============================================================================
# Machine Learning Strategy Complete Example
# =============================================================================

from finlab import data
from finlab.ml import feature as mlf
from finlab.ml import label as mll
import finlab.ml.qlib as q
from finlab.backtest import sim

# 1. Feature Engineering
close = data.get('price:收盤價')
pb = data.get('price_earning_ratio:股價淨值比')
pe = data.get('price_earning_ratio:本益比')
rev = data.get('monthly_revenue:當月營收')

features = mlf.combine({
    'pb': pb,
    'pe': pe,
    'rev_ma3': rev.average(3),
    'rev_ma12': rev.average(12),
    'talib': mlf.ta(mlf.ta_names(n=1)[:20])
}, resample='W')

# 2. Label Generation
label = mll.return_percentage(features.index, resample='W', period=1)

# 3. Data Splitting
is_train = features.index.get_level_values('datetime') < '2023-01-01'
X_train = features[is_train]
y_train = label[is_train]
X_test = features[~is_train]

# 4. Model Training
model = q.LGBModel()
model.fit(X_train, y_train)

# 5. Prediction & Positions
y_pred = model.predict(X_test)
position = y_pred.is_largest(30)

# 6. Backtest
report = sim(position, resample='W', name="ML Strategy", upload=False)
report.display()

# 7. Analysis
report.run_analysis('LiquidityAnalysis')
report.display_mae_mfe_analysis()

print("Done!")

Key Takeaways

Feature Engineering Stage

  • Use diverse feature sources (fundamental, technical, custom)
  • Use mlf.combine() to unify merging, ensuring MultiIndex alignment
  • Check and handle missing values
  • Control feature count (too many leads to overfitting)

Label Generation Stage

  • Use mll.return_percentage() and similar functions, passing features.index
  • resample parameter should match features
  • Prediction period (period) should be reasonable (too short = noisy, too long = hard to predict)

Model Training Stage

  • Use wrapper classes like q.LGBModel(), with fit() + predict()
  • Time-based train/test split (not random split)
  • Start with a simple model (LinearModel) to establish a baseline

Backtest Validation Stage

  • predict() returns FinlabDataFrame; use is_largest() to convert to positions
  • Out-of-sample testing is mandatory
  • Run in-depth analysis (liquidity, MAE/MFE)

Live Deployment Stage

  • Retrain the model periodically (e.g., quarterly)
  • Monitor divergence between live and backtest performance
  • Set up performance alert mechanisms

Common Error Handling Checklist

During ML strategy development, the following are key error checkpoints:

Stage 1: Feature Engineering

Common Errors: - resample mismatch between features and labels - Too many missing values resulting in insufficient training data - Look-ahead bias (using future data to predict the past)

Validation Methods:

try:
    # 1. Build features
    features = mlf.combine({
        'pb': pb,
        'pe': pe,
        'rev_ma3': rev.average(3)
    }, resample='W')

    if features.empty:
        raise ValueError("❌ Feature DataFrame is empty")

    # 2. Check missing value ratio
    missing_ratio = features.isna().sum() / len(features)
    high_missing_cols = missing_ratio[missing_ratio > 0.3].index.tolist()

    if high_missing_cols:
        print(f"⚠️  Warning: the following features have > 30% missing values: {high_missing_cols}")
        print("Suggestion: drop these features or apply forward fill")

    # 3. Check date range
    print(f"Feature date range: {features.index.get_level_values(0).min()} ~ {features.index.get_level_values(0).max()}")

    # 4. Check feature count
    num_features = features.shape[1]
    if num_features > 500:
        print(f"⚠️  Warning: too many features ({num_features}), may cause overfitting")
        print("Suggestion: prefer < 200 features")

    print(f"✅ Feature engineering complete: {num_features} features, {len(features)} rows")

except KeyError as e:
    print(f"❌ Invalid data table name: {e}")
    print("Please visit https://ai.finlab.tw/database to confirm the correct name")

except ValueError as e:
    print(f"❌ Feature validation failed: {e}")

Detailed Error Handling: See Data Download Error Handling

Stage 2: Label Generation

Common Errors: - resample mismatch between labels and features - Passing incorrect index (should pass features.index) - Unreasonable prediction period setting

Validation Methods:

# Generate labels
label = mll.return_percentage(features.index, resample='W', period=1)

# Check label distribution
print("Label statistics:")
print(label.describe())

# Check label missing values
nan_ratio = label.isna().mean()
if nan_ratio > 0.1:
    print(f"⚠️  Warning: label missing ratio {nan_ratio:.1%} > 10%")
    print("Likely cause: prediction period too long, recent rows have no label")

print(f"✅ Label generation complete: {len(label)} rows")

Stage 3: Model Training

Common Errors: - Insufficient training data (< 1000 rows) - Using random split instead of time-based split - Overfitting (test set performance much worse than training set)

Validation Methods:

# Split train/test sets
is_train = features.index.get_level_values('datetime') < '2023-01-01'
X_train = features[is_train]
y_train = label[is_train]
X_test = features[~is_train]

# 1. Check data volume
print(f"Training set: {len(X_train)} rows")
print(f"Test set: {len(X_test)} rows")

if len(X_train) < 1000:
    print("⚠️  Warning: insufficient training data (< 1000 rows)")
    print("Suggestion: extend the historical range or lower the resample frequency")

if len(X_test) < 100:
    print("⚠️  Warning: too few test rows (< 100)")

# 2. Check date ordering
train_last = X_train.index.get_level_values(0).max()
test_first = X_test.index.get_level_values(0).min()

if train_last >= test_first:
    raise ValueError(
        f"❌ Training and test set dates overlap!\n"
        f"   Training set last date: {train_last}\n"
        f"   Test set first date: {test_first}\n"
        f"   This will cause data leakage"
    )

print(f"✅ Dataset split is correct")

# 3. Model training
try:
    model = q.LGBModel()
    model.fit(X_train, y_train)
    print(f"✅ Model training complete")

except Exception as e:
    print(f"❌ Model training failed: {e}")
    print("Please check:")
    print("1. Whether features contain NaN or Inf")
    print("2. Whether labels are numeric")
    print("3. Whether related packages are installed correctly (pip install lightgbm / xgboost)")
    raise

Stage 4: Prediction & Backtesting

Common Errors: - Prediction results are all NaN - Position DataFrame format is incorrect - Backtest has no trade records

Validation Methods:

# 1. Prediction
y_pred = model.predict(X_test)

if y_pred.isna().all().all():
    raise ValueError("❌ All predictions are NaN")

# Check prediction distribution
print(f"Prediction range: {y_pred.min().min():.4f} ~ {y_pred.max().max():.4f}")
print(f"Prediction mean: {y_pred.stack().mean():.4f}")

# 2. Generate positions
position = y_pred.is_largest(30)

if position.empty:
    raise ValueError("❌ Position DataFrame is empty")

holding_count = position.sum(axis=1).mean()
if holding_count < 10:
    print(f"⚠️  Warning: average holding count {holding_count:.1f} < 10, possibly too few")

print(f"✅ Positions generated successfully: average {holding_count:.1f} holdings")

# 3. Backtest
try:
    report = sim(position, resample='W', name="ML Strategy", upload=False)
    print(f"✅ Backtest succeeded")

    stats = report.get_stats()
    print(f"   Annualized return: {stats['daily_mean']:.2%}")
    print(f"   Sharpe ratio: {stats['daily_sharpe']:.2f}")

except Exception as e:
    print(f"❌ Backtest failed: {e}")
    print("Please check:")
    print("1. Whether position.index is a DatetimeIndex")
    print("2. Whether position.columns are stock symbols")
    raise

Risks Specific to ML Strategies

Compared to traditional strategies, ML strategies require extra attention to:

  • Data leakage -- using future data to predict the past
  • Overfitting -- test set performance much worse than training set
  • Model decay -- live performance degrades over time

Recommendations: - Strictly use time-series splitting (not random splitting) - Retrain the model periodically (quarterly or monthly) - Monitor live vs backtest divergence and set up alert mechanisms


Reference Resources