finlab.tools
工具模組,提供事件研究與因子分析功能。
使用情境
- 分析特定事件對股價的影響(事件研究)
- 評估因子的選股能力(因子分析)
- 優化策略的因子組合
- 理解策略績效的來源
快速範例
事件研究
因子分析
from finlab import data
from finlab.tools import factor_analysis as fa
# 準備因子與標籤
marketcap = data.get('etl:market_value')
revenue = data.get('monthly_revenue:當月營收')
# 建立因子
cond1 = marketcap.rank(pct=True, axis=1) < 0.3 # 小市值
cond2 = (revenue.average(3) / revenue.average(12)).rank(pct=True, axis=1) > 0.7 # 營收成長
# 計算因子報酬
factor_return = fa.calc_factor_return(cond1, label=future_return)
print(factor_return)
詳細教學
參考 因子分析教學,了解: - 完整因子分析流程 - 因子報酬計算 - 因子 IC 分析 - Shapley values 貢獻度分析
API Reference
event_study()
finlab.tools.event_study
create_factor_data
create factor data, which contains future return
| PARAMETER | DESCRIPTION |
|---|---|
factor
|
factor data where index is datetime and columns is asset id
TYPE:
|
adj_close
|
adj close where index is datetime and columns is asset id
TYPE:
|
days
|
future return considered
TYPE:
|
Return
Analytic plots and tables
Warning
This function is not identical to finlab.ml.alphalens.create_factor_data
Examples:
from finlab.tools.event_study import create_factor_data
from finlab.tools.event_study import event_study
factor = data.get('price_earning_ratio:股價淨值比')
adj_close = data.get('etl:adj_close')
benchmark = data.get('benchmark_return:發行量加權股價報酬指數')
# create event dataframe
dividend_info = data.get('dividend_announcement')
v = dividend_info[['stock_id', '除權交易日']].set_index(['stock_id', '除權交易日'])
v['value'] = 1
event = v[~v.index.duplicated()].reset_index().drop_duplicates(
subset=['stock_id', '除權交易日']
).pivot(index='除權交易日', columns='stock_id', values='value').notna()
# calculate factor_data
factor_data = create_factor_data({'pb':factor}, adj_close, event=event)
r = event_study(factor_data, benchmark, adj_close)
plt.bar(r.columns, r.mean().values)
plt.plot(r.columns, r.mean().cumsum().values)
event_study
event_study(factor_data, benchmark_adj_close, stock_adj_close, sample_period=(-45, -20), estimation_period=(-5, 20), plot=True)
Run event study and returns the abnormal returns of each stock on each day.
| PARAMETER | DESCRIPTION |
|---|---|
factor_data
|
factor data where index is datetime and columns is asset id
TYPE:
|
benchmark_adj_close
|
benchmark for CAPM
TYPE:
|
stock_adj_close
|
stock price for CAPM
TYPE:
|
sample_period
|
period for fitting CAPM
TYPE:
|
estimation_period
|
period for calculating alpha (abnormal return)
TYPE:
|
plot
|
plot the result
TYPE:
|
Return
Abnormal returns of each stock on each day.
Examples:
from finlab.tools.event_study import create_factor_data
from finlab.tools.event_study import event_study
factor = data.get('price_earning_ratio:股價淨值比')
adj_close = data.get('etl:adj_close')
benchmark = data.get('benchmark_return:發行量加權股價報酬指數')
# create event dataframe
dividend_info = data.get('dividend_announcement')
v = dividend_info[['stock_id', '除權交易日']].set_index(['stock_id', '除權交易日'])
v['value'] = 1
event = v[~v.index.duplicated()].reset_index().drop_duplicates(
subset=['stock_id', '除權交易日']
).pivot(index='除權交易日', columns='stock_id', values='value').notna()
# calculate factor_data
factor_data = create_factor_data({'pb':factor}, adj_close, event=event)
r = event_study(factor_data, benchmark, adj_close)
plt.bar(r.columns, r.mean().values)
plt.plot(r.columns, r.mean().cumsum().values)
事件研究:分析特定事件對股價的影響。
使用說明
事件研究用於分析特定事件(如財報公告、法說會、股利發放)對股價的短期與長期影響。
factor_analysis
finlab.tools.factor_analysis
Factor analysis toolkit -- public facade.
All public symbols are re-exported from focused submodules so that
existing from finlab.tools.factor_analysis import ... imports keep
working unchanged.
Submodules
- :mod:
finlab.tools.factor_metrics-- IC, correlation, NDCG scoring - :mod:
finlab.tools.factor_returns-- boolean factor returns & Shapley values - :mod:
finlab.tools.factor_centrality-- PCA-based rolling centrality - :mod:
finlab.tools.factor_regression-- OLS trend analysis
calc_centrality
Compute rolling PCA centrality over return_df.
| PARAMETER | DESCRIPTION |
|---|---|
return_df
|
Time-series DataFrame (dates x assets/factors).
TYPE:
|
window_periods
|
Rolling window length in rows.
TYPE:
|
n_components
|
Number of PCA components.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
DataFrame of rolling centrality scores. |
calc_factor_return
Compute equal-weight portfolio returns per boolean factor.
Each column in features must be boolean. For every date the mean
label value across selected (True) stocks is returned.
| PARAMETER | DESCRIPTION |
|---|---|
features
|
Boolean feature DataFrame (MultiIndex: date x stock).
TYPE:
|
labels
|
Excess-return labels with the same MultiIndex.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
DataFrame of per-period returns, one column per factor, |
DataFrame
|
starting from the first fully-populated date. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If any feature column is not boolean. |
calc_ic
Compute per-date IC between features and labels.
| PARAMETER | DESCRIPTION |
|---|---|
features
|
MultiIndex DataFrame (date, stock) with factor columns.
TYPE:
|
labels
|
MultiIndex Series with the same index.
TYPE:
|
rank
|
If
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
DataFrame of IC values per date and factor. |
calc_metric
Compute a cross-sectional metric between factor and forward returns.
| PARAMETER | DESCRIPTION |
|---|---|
factor
|
Single factor DataFrame or dict of named factor DataFrames.
TYPE:
|
adj_close
|
Adjusted close prices.
TYPE:
|
days
|
Forward-return horizons (default
TYPE:
|
func
|
Scoring callable applied per date group (default :func:
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
DataFrame with one column per |
calc_regression_stats
Run per-column OLS regressions and classify each trend.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Time-series DataFrame with a
TYPE:
|
p_value_threshold
|
Significance level for trend classification.
TYPE:
|
r_squared_threshold
|
Minimum R-squared for a non-flat trend.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
DataFrame with columns |
DataFrame
|
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If the DataFrame index is not a |
calc_shapley_values
Compute Shapley values measuring each factor's marginal contribution.
Enumerates all 2^n subsets of factors, computes the equal-weight portfolio return of each subset, and distributes the return evenly among the factors in that subset.
| PARAMETER | DESCRIPTION |
|---|---|
features
|
Boolean feature DataFrame (MultiIndex: date x stock).
TYPE:
|
labels
|
Excess-return labels with the same MultiIndex.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
DataFrame of Shapley values, one column per factor. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If features is empty, labels lacks a MultiIndex, or any feature column is not boolean. |
generate_features_and_labels
Build a feature matrix and an excess-return label vector.
Wraps :func:finlab.ml.feature.combine and
:func:finlab.ml.label.excess_over_mean.
| PARAMETER | DESCRIPTION |
|---|---|
dfs
|
Dict mapping factor names to DataFrames (or callables).
TYPE:
|
resample
|
Resampling frequency / index passed to the ML helpers.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
tuple[DataFrame, Series]
|
|
is_boolean_series
Check whether series contains only boolean-like values (including NaN).
Handles the case where older pandas converts bool + NaN to float.
因子分析模組,用於評估因子的選股能力與貢獻度。
主要功能:
| 函數 | 說明 | 用途 |
|---|---|---|
calc_factor_return() |
計算因子報酬 | 評估因子的整體表現 |
calc_ic() |
計算因子 IC(資訊係數) | 評估因子與未來報酬的相關性 |
calc_shapley_values() |
計算 Shapley values | 評估因子在多因子模型中的邊際貢獻 |
calc_centrality() |
計算因子集中度 | 評估因子在不同股票上的分布 |
calc_factor_return()
計算因子報酬,評估因子的選股能力。
使用範例:
from finlab import data
from finlab.tools import factor_analysis as fa
# 建立因子:小市值
marketcap = data.get('etl:market_value')
factor = marketcap.rank(pct=True, axis=1) < 0.3
# 計算未來 1 個月報酬(標籤)
close = data.get('price:收盤價')
future_return = close.shift(-20) / close - 1
# 計算因子報酬
factor_return = fa.calc_factor_return(factor, label=future_return)
# 查看結果
print(f"平均因子報酬: {factor_return.mean():.2%}")
print(f"因子報酬標準差: {factor_return.std():.2%}")
print(f"因子夏普率: {factor_return.mean() / factor_return.std():.2f}")
解讀: - 正值:因子有選股能力(選中的股票表現較好) - 負值:因子反向有效(應反向操作) - 接近 0:因子無選股能力
最佳實踐
- 使用長時間回測驗證因子穩定性
- 對比不同市場環境(牛市、熊市)的因子表現
- 結合 IC 分析確認因子有效性
calc_ic()
計算因子 IC(Information Coefficient),衡量因子與未來報酬的相關性。
使用範例:
from finlab import data
from finlab.tools import factor_analysis as fa
# 建立因子:營收成長率
revenue = data.get('monthly_revenue:當月營收')
factor = revenue.average(3) / revenue.average(12) # 近 3 月平均 vs 近 12 月平均
# 計算未來報酬
close = data.get('price:收盤價')
future_return = close.shift(-20) / close - 1
# 計算 IC
ic = fa.calc_ic(factor, label=future_return)
# 查看結果
print(f"平均 IC: {ic.mean():.4f}")
print(f"IC 標準差: {ic.std():.4f}")
print(f"IC_IR (IC 夏普率): {ic.mean() / ic.std():.2f}")
# 視覺化 IC 時間序列
ic.plot(title='營收成長因子 IC')
IC 評估標準:
| IC 值 | 評價 | 說明 |
|---|---|---|
| > 0.05 | 優秀 | 因子有強選股能力 |
| 0.03 ~ 0.05 | 良好 | 因子有選股能力 |
| 0.01 ~ 0.03 | 一般 | 因子弱選股能力 |
| < 0.01 | 無效 | 因子無選股能力 |
IC 分析注意事項
- IC > 0: 因子與未來報酬正相關(做多因子高分股票)
- IC < 0: 因子與未來報酬負相關(做多因子低分股票)
- IC 波動大: 因子不穩定,風險高
- IC_IR > 0.5: 因子風險調整後表現佳
calc_shapley_values()
計算 Shapley values,評估多因子模型中各因子的邊際貢獻。
使用範例:
from finlab import data
from finlab.tools import factor_analysis as fa
# 建立多個因子
marketcap = data.get('etl:market_value')
revenue = data.get('monthly_revenue:當月營收')
close = data.get('price:收盤價')
factor1 = marketcap.rank(pct=True, axis=1) < 0.3 # 小市值
factor2 = (revenue.average(3) / revenue.average(12)).rank(pct=True, axis=1) > 0.7 # 營收成長
factor3 = (close / close.shift(20)).rank(pct=True, axis=1) > 0.7 # 動能
# 計算未來報酬
future_return = close.shift(-20) / close - 1
# 計算 Shapley values
factors = [factor1, factor2, factor3]
shapley = fa.calc_shapley_values(factors, label=future_return)
# 查看結果
print("各因子的 Shapley 值:")
for i, value in enumerate(shapley):
print(f"因子 {i+1}: {value:.4f}")
解讀: - 高 Shapley 值:因子貢獻大,重要性高 - 低 Shapley 值:因子貢獻小,可考慮移除 - 負 Shapley 值:因子有害,拉低整體表現
應用建議
- 移除 Shapley 值為負或接近 0 的因子
- 保留 Shapley 值高的因子
- 重新評估移除因子後的策略表現
calc_centrality()
計算因子集中度,評估因子在不同股票上的分布。
使用範例:
from finlab import data
from finlab.tools import factor_analysis as fa
# 建立因子
marketcap = data.get('etl:market_value')
factor = marketcap.rank(pct=True, axis=1) < 0.3
# 計算集中度
centrality = fa.calc_centrality(factor)
print(f"因子集中度: {centrality}")
解讀: - 高集中度:因子集中在少數股票,分散不足 - 低集中度:因子分散在多數股票,分散佳
完整範例:多因子策略分析
from finlab import data
from finlab.tools import factor_analysis as fa
from finlab.backtest import sim
# 步驟 1:建立因子
marketcap = data.get('etl:market_value')
revenue = data.get('monthly_revenue:當月營收')
close = data.get('price:收盤價')
factor1 = marketcap.rank(pct=True, axis=1) < 0.3 # 小市值
factor2 = (revenue.average(3) / revenue.average(12)).rank(pct=True, axis=1) > 0.7 # 營收成長
factor3 = (close / close.shift(20)).rank(pct=True, axis=1) > 0.7 # 動能
# 步驟 2:計算標籤(未來報酬)
future_return = close.shift(-20) / close - 1
# 步驟 3:分析各因子表現
print("=== 因子報酬分析 ===")
for i, factor in enumerate([factor1, factor2, factor3], 1):
fr = fa.calc_factor_return(factor, label=future_return)
ic = fa.calc_ic(factor.rank(pct=True, axis=1), label=future_return)
print(f"\n因子 {i}:")
print(f" 平均報酬: {fr.mean():.2%}")
print(f" 平均 IC: {ic.mean():.4f}")
# 步驟 4:計算 Shapley values(貢獻度)
print("\n=== Shapley Values 分析 ===")
shapley = fa.calc_shapley_values([factor1, factor2, factor3], label=future_return)
for i, value in enumerate(shapley, 1):
print(f"因子 {i} 貢獻度: {value:.4f}")
# 步驟 5:回測策略
position = factor1 & factor2 & factor3
report = sim(position, resample='M')
report.display()
print("\n=== 策略績效 ===")
metrics = report.get_metrics()
print(f"年化報酬: {metrics['annual_return']:.2%}")
print(f"夏普率: {metrics['daily_sharpe']:.2f}")
print(f"最大回撤: {metrics['max_drawdown']:.2%}")
常見問題
Q: 如何判斷一個因子是否有效?
使用三個指標綜合判斷:
# 1. 因子報酬(應顯著 > 0)
factor_return = fa.calc_factor_return(factor, label=future_return)
print(f"平均因子報酬: {factor_return.mean():.2%}")
# 2. IC(應 > 0.02)
ic = fa.calc_ic(factor.rank(pct=True, axis=1), label=future_return)
print(f"平均 IC: {ic.mean():.4f}")
# 3. IC_IR(應 > 0.5)
ic_ir = ic.mean() / ic.std()
print(f"IC_IR: {ic_ir:.2f}")
# 結論:三個指標都合格才認為因子有效
Q: Shapley values 如何應用於策略優化?
# 計算所有因子的 Shapley values
factors = [factor1, factor2, factor3, factor4, factor5]
shapley = fa.calc_shapley_values(factors, label=future_return)
# 移除貢獻度低或負的因子
threshold = 0.001
valid_factors = [f for f, s in zip(factors, shapley) if s > threshold]
print(f"原始因子數: {len(factors)}")
print(f"優化後因子數: {len(valid_factors)}")
# 使用優化後的因子組合
optimized_position = valid_factors[0]
for factor in valid_factors[1:]:
optimized_position = optimized_position & factor
Q: IC 和因子報酬的區別?
- IC (Information Coefficient):
- 衡量因子值與未來報酬的相關性
- 範圍 [-1, 1],越接近 1 越好
-
適合評估連續型因子(如市值、ROE)
-
因子報酬:
- 衡量因子選出的股票的平均報酬
- 適合評估二元因子(True/False)
- 直接反映選股效果
# 連續型因子用 IC
continuous_factor = data.get('price_earning_ratio:本益比')
ic = fa.calc_ic(continuous_factor, label=future_return)
# 二元因子用因子報酬
binary_factor = continuous_factor < continuous_factor.quantile(0.3, axis=1)
factor_return = fa.calc_factor_return(binary_factor, label=future_return)