Machine Learning
Library Installation
Required Installation
Install core packages (finlab, TA-Lib):
Model Libraries (Optional)
finlab.ml.qlib supports various models (LightGBM, XGBoost, CatBoost, PyTorch, TensorFlow, etc.). Install them as needed following their official documentation; most are pre-installed in Colab.
ML workflows are complex -- consider using an AI assistant
From feature engineering to model training, ML strategies involve many steps. After installing FinLab Skill, the AI coding assistant can help you select features, split datasets, train models, and interpret backtest results.
Feature Processing
Using the Combine Function to Merge Features
finlab.ml.feature.combine merges features from multiple sources (technical, fundamental, custom) with resampling support. Examples:
- Merge P/B ratio and P/E ratio into one feature set:
from finlab import data
from finlab.ml import feature as mlf
features = mlf.combine({
'pb': data.get('price_earning_ratio:股價淨值比'),
'pe': data.get('price_earning_ratio:本益比')
}, resample='W')
features.head()
| pb | pe | |
|---|---|---|
| (Timestamp('2010-01-04 00:00:00'), '1101') | 1.47 | 18.85 |
| (Timestamp('2010-01-04 00:00:00'), '1102') | 1.44 | 14.58 |
| (Timestamp('2010-01-04 00:00:00'), '1103') | 0.79 | 40.89 |
| (Timestamp('2010-01-04 00:00:00'), '1104') | 0.92 | 73.6 |
- Merge technical indicators into one feature set:
| talib.HT_DCPERIOD__real__ | talib.HT_DCPHASE__real__ | talib.HT_PHASOR__quadrature__ | |
|---|---|---|---|
| (Timestamp('2024-04-01 00:00:00'), '9951') | 23.4372 | 122.135 | -0.0107087 |
| (Timestamp('2024-04-01 00:00:00'), '9955') | 18.4416 | 68.0654 | -0.0168584 |
| (Timestamp('2024-04-01 00:00:00'), '9958') | 30.1035 | -10.7866 | 0.159777 |
| (Timestamp('2024-04-01 00:00:00'), '9960') | 17.5025 | 94.0009 | 0.00310615 |
| (Timestamp('2024-04-01 00:00:00'), '9962') | 23.2931 | 90.0781 | -0.0145453 |
Using TA-Lib to Generate Technical Indicators
finlab exposes ta and ta_names for generating TA-Lib indicator features.
ta_names Function
ta_names generates a list of TA-Lib indicator names, one per parameter configuration.
- n: random parameter configurations per indicator.
n=10yields 10 variants per TA-Lib indicator.
['talib.HT_DCPERIOD__real__',
'talib.HT_DCPHASE__real__',
'talib.HT_PHASOR__quadrature__',
'talib.HT_PHASOR__inphase__',
'talib.HT_SINE__sine__',
'talib.HT_SINE__leadsine__'
...
]
ta Function
ta computes the actual values for a list of names produced by ta_names.
resample: optional parameter that resamples the computed indicator values to a given time frequency.
from finlab.ml import feature as mlf
mlf.ta(['talib.HT_DCPERIOD__real__',
'talib.HT_DCPHASE__real__',
'talib.HT_PHASOR__quadrature__'], resample='W')
| talib.HT_DCPERIOD__real__ | talib.HT_DCPHASE__real__ | |
|---|---|---|
| (Timestamp('2024-04-07 00:00:00'), '9951') | 23.4372 | 122.135 |
| (Timestamp('2024-04-07 00:00:00'), '9955') | 18.4416 | 68.0654 |
| (Timestamp('2024-04-07 00:00:00'), '9958') | 30.1035 | -10.7866 |
| (Timestamp('2024-04-07 00:00:00'), '9960') | 17.5025 | 94.0009 |
| (Timestamp('2024-04-07 00:00:00'), '9962') | 23.2931 | 90.0781 |
Label Generation
Using the Label Function to Generate Labels
finlab.ml.label provides various return/risk label computations for training prediction models.
Predicting daytrading_percentage
This function computes the percentage change in market prices within a given period, specifically from open to close.
resample: Must match theresampleused incombineto align time periods.period: Number of future periods (defined byresample) for computing change.
from finlab.ml import feature as mlf
from finlab.ml import label as mll
feature = mlf.combine(...)
label = mll.daytrading_percentage(feature.index)
datetime instrument
2007-04-23 0015 0.000000
0050 0.003454
0051 0.004874
0052 0.006510
01001T 0.001509
dtype: float64
Predicting N-day Future Returns
Computes the percentage change within a given period for analyzing medium to long-term performance.
Maximum Adverse Excursion
MAE: Maximum adverse movement during the holding period (currently does not support resample).
Maximum Favorable Excursion
MFE: Maximum favorable movement during the holding period (currently does not support resample).
Excess Over Median
Excess return relative to the market-wide median return for the same period.
Excess Over Mean
Excess return relative to the market-wide mean return for the same period.
Ensure the index and market settings are correct; labels can be directly combined with features for model training.
Model Training with Qlib
WrapperModel adapts LightGBM, XGBoost, CatBoost, linear models, TabNet, DNN, etc. into a uniform fit/predict interface.
LGBModel
Wraps the LightGBM model.
import finlab.ml.qlib as q
# Construct X_train, y_train, X_test
model = q.LGBModel()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
How to construct X_train, y_train, X_test?
Example using data before 2020 as the training set:
XGBModel
Wraps the XGBoost model.
DEnsmbleModel
Wraps the Double Ensemble model.
CatBoostModel
Wraps the CatBoost model.
LinearModel
Wraps a linear model.
TabnetModel
Wraps the TabNet model.
DNNModel
Wraps a deep neural network model.
The wrappers let you focus on features and strategy without getting bogged down in model details.
get_models
get_models quickly retrieves the list of available models and initializes them, facilitating multi-model experimentation.
import finlab.ml.qlib as q
# Get all available models
models = q.get_models()
# Print all model names
print(list(models.keys()))
# Select and instantiate a model, e.g., LightGBM
model = models['LGBModel']()
# Assuming X_train, y_train, X_test are already prepared
# Train model
model.fit(X_train, y_train)
# Predict using the trained model
y_pred = model.predict(X_test)
Running Backtests
Use sim to backtest, computing strategy returns and risk metrics: