Machine Learning

Library Installation

Required Installation

Install core packages (finlab, TA-Lib):

pip install finlab
pip install ta-lib

Model Libraries (Optional)

finlab.ml.qlib supports various models (LightGBM, XGBoost, CatBoost, PyTorch, TensorFlow, etc.). Install them as needed following their official documentation; most are pre-installed in Colab.

ML workflows are complex -- consider using an AI assistant

From feature engineering to model training, ML strategies involve many steps. After installing FinLab Skill, the AI coding assistant can help you select features, split datasets, train models, and interpret backtest results.

Feature Processing

Using the Combine Function to Merge Features

finlab.ml.feature.combine merges features from multiple sources (technical, fundamental, custom) with resampling support. Examples:

Merge P/B ratio and P/E ratio into one feature set:

from finlab import data
from finlab.ml import feature as mlf
features = mlf.combine({
  'pb': data.get('price_earning_ratio:股價淨值比'),
  'pe': data.get('price_earning_ratio:本益比')
}, resample='W')

features.head()

	pb	pe
(Timestamp('2010-01-04 00:00:00'), '1101')	1.47	18.85
(Timestamp('2010-01-04 00:00:00'), '1102')	1.44	14.58
(Timestamp('2010-01-04 00:00:00'), '1103')	0.79	40.89
(Timestamp('2010-01-04 00:00:00'), '1104')	0.92	73.6

Merge technical indicators into one feature set:

from finlab.ml import feature as mlf
mlf.combine({
  'talib': mlf.ta(mlf.ta_names(n=1))
})

	talib.HT_DCPERIOD__real__	talib.HT_DCPHASE__real__	talib.HT_PHASOR__quadrature__
(Timestamp('2024-04-01 00:00:00'), '9951')	23.4372	122.135	-0.0107087
(Timestamp('2024-04-01 00:00:00'), '9955')	18.4416	68.0654	-0.0168584
(Timestamp('2024-04-01 00:00:00'), '9958')	30.1035	-10.7866	0.159777
(Timestamp('2024-04-01 00:00:00'), '9960')	17.5025	94.0009	0.00310615
(Timestamp('2024-04-01 00:00:00'), '9962')	23.2931	90.0781	-0.0145453

Using TA-Lib to Generate Technical Indicators

finlab exposes ta and ta_names for generating TA-Lib indicator features.

ta_names Function

ta_names generates a list of TA-Lib indicator names, one per parameter configuration.

n: random parameter configurations per indicator. n=10 yields 10 variants per TA-Lib indicator.

from finlab.ml import feature as mlf
mlf.ta_names(n=1)

['talib.HT_DCPERIOD__real__',
 'talib.HT_DCPHASE__real__',
 'talib.HT_PHASOR__quadrature__',
 'talib.HT_PHASOR__inphase__',
 'talib.HT_SINE__sine__',
 'talib.HT_SINE__leadsine__'
 ...
 ]

ta Function

ta computes the actual values for a list of names produced by ta_names.

resample: optional parameter that resamples the computed indicator values to a given time frequency.

from finlab.ml import feature as mlf
mlf.ta(['talib.HT_DCPERIOD__real__',
 'talib.HT_DCPHASE__real__',
 'talib.HT_PHASOR__quadrature__'], resample='W')

	talib.HT_DCPERIOD__real__	talib.HT_DCPHASE__real__
(Timestamp('2024-04-07 00:00:00'), '9951')	23.4372	122.135
(Timestamp('2024-04-07 00:00:00'), '9955')	18.4416	68.0654
(Timestamp('2024-04-07 00:00:00'), '9958')	30.1035	-10.7866
(Timestamp('2024-04-07 00:00:00'), '9960')	17.5025	94.0009
(Timestamp('2024-04-07 00:00:00'), '9962')	23.2931	90.0781

Label Generation

Using the Label Function to Generate Labels

finlab.ml.label provides various return/risk label computations for training prediction models.

Predicting daytrading_percentage

This function computes the percentage change in market prices within a given period, specifically from open to close.

resample: Must match the resample used in combine to align time periods.
period: Number of future periods (defined by resample) for computing change.

from finlab.ml import feature as mlf
from finlab.ml import label as mll
feature = mlf.combine(...)
label = mll.daytrading_percentage(feature.index)

datetime    instrument
2007-04-23  0015          0.000000
            0050          0.003454
            0051          0.004874
            0052          0.006510
            01001T        0.001509
dtype: float64

Predicting N-day Future Returns

Computes the percentage change within a given period for analyzing medium to long-term performance.

label = mll.return_percentage(feature.index, resample='W', period=1)

Maximum Adverse Excursion

MAE: Maximum adverse movement during the holding period (currently does not support resample).

label = mll.maximum_adverse_excursion(feature.index, period=1)

Maximum Favorable Excursion

MFE: Maximum favorable movement during the holding period (currently does not support resample).

label = mll.maximum_favorable_excursion(feature.index, period=1)

Excess Over Median

Excess return relative to the market-wide median return for the same period.

label = mll.excess_over_median(feature.index, resample='M', period=1)

Excess Over Mean

Excess return relative to the market-wide mean return for the same period.

label = mll.excess_over_mean(feature.index, resample='M', period=1)

Ensure the index and market settings are correct; labels can be directly combined with features for model training.

Model Training with Qlib

WrapperModel adapts LightGBM, XGBoost, CatBoost, linear models, TabNet, DNN, etc. into a uniform fit/predict interface.

LGBModel

Wraps the LightGBM model.

import finlab.ml.qlib as q

# Construct X_train, y_train, X_test

model = q.LGBModel()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

How to construct X_train, y_train, X_test?

Example using data before 2020 as the training set:

is_train = features.index.get_level_values('datetime') < '2020-01-01'
X_train = features[is_train]
y_train = labels[is_train]
X_test = features[~is_train]

XGBModel

Wraps the XGBoost model.

model = q.XGBModel()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

DEnsmbleModel

Wraps the Double Ensemble model.

model = q.DEnsmbleModel()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

CatBoostModel

Wraps the CatBoost model.

model = q.CatBoostModel()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

LinearModel

Wraps a linear model.

model = q.LinearModel()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

TabnetModel

Wraps the TabNet model.

model = q.TabnetModel()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

DNNModel

Wraps a deep neural network model.

model = q.DNNModel()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

The wrappers let you focus on features and strategy without getting bogged down in model details.

get_models

get_models quickly retrieves the list of available models and initializes them, facilitating multi-model experimentation.

import finlab.ml.qlib as q

# Get all available models
models = q.get_models()

# Print all model names
print(list(models.keys()))

# Select and instantiate a model, e.g., LightGBM
model = models['LGBModel']()

# Assuming X_train, y_train, X_test are already prepared

# Train model
model.fit(X_train, y_train)

# Predict using the trained model
y_pred = model.predict(X_test)

Running Backtests

Use sim to backtest, computing strategy returns and risk metrics:

from finlab.backtest import sim

position = y_pred.is_largest(50)

sim(position, resample='4W')