Machine Learning
Library Installation
Required Installation
Install core packages (finlab, TA-Lib):
Model Libraries (Optional)
finlab.ml.qlib supports various models (LightGBM, XGBoost, CatBoost, PyTorch, TensorFlow, etc.). Install them as needed following their official documentation; most are pre-installed in Colab.
ML workflows are complex -- consider using an AI assistant
From feature engineering to model training, ML strategies involve many steps. After installing FinLab Skill, the AI coding assistant can help you select features, split datasets, train models, and interpret backtest results.
Feature Processing
Using the Combine Function to Merge Features
finlab.ml.feature.combine merges features from multiple sources (technical, fundamental, custom) with resampling support. Examples:
- Merge P/B ratio and P/E ratio into one feature set:
from finlab import data
from finlab.ml import feature as mlf
features = mlf.combine({
'pb': data.get('price_earning_ratio:股價淨值比'),
'pe': data.get('price_earning_ratio:本益比')
}, resample='W')
features.head()
| pb | pe | |
|---|---|---|
| (Timestamp('2010-01-04 00:00:00'), '1101') | 1.47 | 18.85 |
| (Timestamp('2010-01-04 00:00:00'), '1102') | 1.44 | 14.58 |
| (Timestamp('2010-01-04 00:00:00'), '1103') | 0.79 | 40.89 |
| (Timestamp('2010-01-04 00:00:00'), '1104') | 0.92 | 73.6 |
This merges P/B and P/E into a single feature set.
- Merge technical indicators into one feature set:
| talib.HT_DCPERIOD__real__ | talib.HT_DCPHASE__real__ | talib.HT_PHASOR__quadrature__ | |
|---|---|---|---|
| (Timestamp('2024-04-01 00:00:00'), '9951') | 23.4372 | 122.135 | -0.0107087 |
| (Timestamp('2024-04-01 00:00:00'), '9955') | 18.4416 | 68.0654 | -0.0168584 |
| (Timestamp('2024-04-01 00:00:00'), '9958') | 30.1035 | -10.7866 | 0.159777 |
| (Timestamp('2024-04-01 00:00:00'), '9960') | 17.5025 | 94.0009 | 0.00310615 |
| (Timestamp('2024-04-01 00:00:00'), '9962') | 23.2931 | 90.0781 | -0.0145453 |
In this example, we use the mlf.ta and mlf.ta_names functions to generate a set of technical indicator features. The process first generates random technical indicators via mlf.ta_names(n=1), then mlf.ta calculates the corresponding indicator values. Finally, combine merges these technical indicators into a DataFrame, providing a rich feature set for quantitative strategy development.
These two examples demonstrate the versatility and flexibility of the combine function. Whether for fundamental or technical analysis, it provides powerful data support for investors and analysts making more precise decisions in complex financial markets.
Using TA-Lib to Generate Technical Indicators
When using the finlab library for quantitative trading strategy development, technical indicators play an extremely important role. finlab provides a powerful set of tools for generating and utilizing these indicators. The ta and ta_names functions are key to generating technical indicator features.
ta_names Function
The ta_names function generates a series of TA-Lib technical indicator names. These names reflect the indicator's computation method and parameters. This function is very useful because it allows exploration and experimentation with different indicator configurations to find the optimal feature combination.
- n parameter: In
ta_names, thenparameter specifies how many random parameter configurations are generated for each indicator. For example, ifn=10, then for each TA-Lib indicator,ta_nameswill generate 10 different parameter configurations. This allows you to select from many different settings to explore the relationship between data and strategy performance.
['talib.HT_DCPERIOD__real__',
'talib.HT_DCPHASE__real__',
'talib.HT_PHASOR__quadrature__',
'talib.HT_PHASOR__inphase__',
'talib.HT_SINE__sine__',
'talib.HT_SINE__leadsine__'
...
]
ta Function
Once you have the indicator name list (obtainable via ta_names), use the ta function to compute the actual indicator values. The ta function is a powerful tool that calculates indicator values based on specified names and parameter settings.
-
Functionality:
taaccepts one or more indicator names generated byta_namesand computes their values. This is crucial for feature engineering as it allows building prediction models based on indicator results. -
Flexibility: The combined use of these two functions provides great flexibility, allowing quant analysts and traders to test and optimize their strategies across different time periods and market conditions.
-
resample parameter: The
tafunction also supports aresampleparameter that resamples computed indicator values to a specified time frequency. This is very useful for time series data processing.
from finlab.ml import feature as mlf
mlf.ta(['talib.HT_DCPERIOD__real__',
'talib.HT_DCPHASE__real__',
'talib.HT_PHASOR__quadrature__'], resample='W')
| talib.HT_DCPERIOD__real__ | talib.HT_DCPHASE__real__ | |
|---|---|---|
| (Timestamp('2024-04-07 00:00:00'), '9951') | 23.4372 | 122.135 |
| (Timestamp('2024-04-07 00:00:00'), '9955') | 18.4416 | 68.0654 |
| (Timestamp('2024-04-07 00:00:00'), '9958') | 30.1035 | -10.7866 |
| (Timestamp('2024-04-07 00:00:00'), '9960') | 17.5025 | 94.0009 |
| (Timestamp('2024-04-07 00:00:00'), '9962') | 23.2931 | 90.0781 |
In summary, ta_names and ta are two core tools in the finlab library for generating and computing technical indicator features. By experimenting with different parameter settings (using the n parameter in ta_names) and computing indicator values under those settings (using ta), quantitative strategy developers can deeply mine data to find the best indicator combinations to guide their trading decisions.
Label Generation
Using the Label Function to Generate Labels
finlab.ml.label provides various return/risk label computations for training prediction models.
Predicting daytrading_percentage
This function computes the percentage change in market prices within a given period, specifically from open to close.
resample: Must match theresampleused incombineto align time periods.period: Number of future periods (defined byresample) for computing change.
from finlab.ml import feature as mlf
from finlab.ml import label as mll
feature = mlf.combine(...)
label = mll.daytrading_percentage(feature.index)
datetime instrument
2007-04-23 0015 0.000000
0050 0.003454
0051 0.004874
0052 0.006510
01001T 0.001509
dtype: float64
Predicting N-day Future Returns
Computes the percentage change within a given period for analyzing medium to long-term performance.
Maximum Adverse Excursion
MAE: Maximum adverse movement during the holding period (currently does not support resample).
Maximum Favorable Excursion
MFE: Maximum favorable movement during the holding period (currently does not support resample).
Excess Over Median
Excess return relative to the market-wide median return for the same period.
Excess Over Mean
Excess return relative to the market-wide mean return for the same period.
Ensure the index and market settings are correct; labels can be directly combined with features for model training.
Model Training with Qlib
This code demonstrates how to use various machine learning models within the Qlib framework for quantitative investment strategy development. WrapperModel is a wrapper class for initializing and fitting different ML models, including LightGBM, XGBoost, CatBoost, linear models, TabNet, deep neural networks, and more. This wrapper makes using these models in Qlib simpler and more unified.
Below is a brief introduction and example usage for each model wrapper:
LGBModel
Wraps the LightGBM model.
import finlab.ml.qlib as q
# Construct X_train, y_train, X_test
model = q.LGBModel()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
How to construct X_train, y_train, X_test?
Example using data before 2020 as the training set:
XGBModel
Wraps the XGBoost model.
DEnsmbleModel
Wraps the Double Ensemble model.
CatBoostModel
Wraps the CatBoost model.
LinearModel
Wraps a linear model.
TabnetModel
Wraps the TabNet model.
DNNModel
Wraps a deep neural network model.
The wrappers let you focus on features and strategy without getting bogged down in model details.
get_models
get_models quickly retrieves the list of available models and initializes them, facilitating multi-model experimentation.
import finlab.ml.qlib as q
# Get all available models
models = q.get_models()
# Print all model names
print(list(models.keys()))
# Select and instantiate a model, e.g., LightGBM
model = models['LGBModel']()
# Assuming X_train, y_train, X_test are already prepared
# Train model
model.fit(X_train, y_train)
# Predict using the trained model
y_pred = model.predict(X_test)
The above demonstrates how to list models, create an LGBModel instance, and train/predict.
Running Backtests
Use sim to backtest, computing strategy returns and risk metrics:
The above uses model rankings to generate position, specifies backtesting frequency with resample, and executes the backtest.