跳轉到

finlab.tools

工具模組,提供事件研究與因子分析功能。

使用情境

  • 分析特定事件對股價的影響(事件研究)
  • 評估因子的選股能力(因子分析)
  • 優化策略的因子組合
  • 理解策略績效的來源

快速範例

事件研究

from finlab.tools import event_study

# 分析營收公告後的股價表現
# (待補充:需要實際的事件資料)

因子分析

from finlab import data
from finlab.tools import factor_analysis as fa

# 準備因子與標籤
marketcap = data.get('etl:market_value')
revenue = data.get('monthly_revenue:當月營收')

# 建立因子
cond1 = marketcap.rank(pct=True, axis=1) < 0.3  # 小市值
cond2 = (revenue.average(3) / revenue.average(12)).rank(pct=True, axis=1) > 0.7  # 營收成長

# 計算因子報酬
factor_return = fa.calc_factor_return(cond1, label=future_return)
print(factor_return)

詳細教學

參考 因子分析教學,了解: - 完整因子分析流程 - 因子報酬計算 - 因子 IC 分析 - Shapley values 貢獻度分析


API Reference

event_study()

finlab.tools.event_study

create_factor_data

create_factor_data(factor, adj_close, days=None, event=None)

create factor data, which contains future return

PARAMETER DESCRIPTION
factor

factor data where index is datetime and columns is asset id

TYPE: DataFrame

adj_close

adj close where index is datetime and columns is asset id

TYPE: DataFrame

days

future return considered

TYPE: list[int] DEFAULT: None

Return

Analytic plots and tables

Warning

This function is not identical to finlab.ml.alphalens.create_factor_data

Examples:

現金增減資分析
from finlab.tools.event_study import create_factor_data
from finlab.tools.event_study import event_study

factor = data.get('price_earning_ratio:股價淨值比')
adj_close = data.get('etl:adj_close')
benchmark = data.get('benchmark_return:發行量加權股價報酬指數')

# create event dataframe
dividend_info = data.get('dividend_announcement')
v = dividend_info[['stock_id', '除權交易日']].set_index(['stock_id', '除權交易日'])
v['value'] = 1
event = v[~v.index.duplicated()].reset_index().drop_duplicates(
    subset=['stock_id', '除權交易日']
).pivot(index='除權交易日', columns='stock_id', values='value').notna()

# calculate factor_data
factor_data = create_factor_data({'pb':factor}, adj_close, event=event)

r = event_study(factor_data, benchmark, adj_close)

plt.bar(r.columns, r.mean().values)
plt.plot(r.columns, r.mean().cumsum().values)

event_study

event_study(factor_data, benchmark_adj_close, stock_adj_close, sample_period=(-45, -20), estimation_period=(-5, 20), plot=True)

Run event study and returns the abnormal returns of each stock on each day.

PARAMETER DESCRIPTION
factor_data

factor data where index is datetime and columns is asset id

TYPE: DataFrame

benchmark_adj_close

benchmark for CAPM

TYPE: DataFrame

stock_adj_close

stock price for CAPM

TYPE: DataFrame

sample_period

period for fitting CAPM

TYPE: (int, int) DEFAULT: (-45, -20)

estimation_period

period for calculating alpha (abnormal return)

TYPE: (int, int) DEFAULT: (-5, 20)

plot

plot the result

TYPE: bool DEFAULT: True

Return

Abnormal returns of each stock on each day.

Examples:

現金增減資分析
from finlab.tools.event_study import create_factor_data
from finlab.tools.event_study import event_study

factor = data.get('price_earning_ratio:股價淨值比')
adj_close = data.get('etl:adj_close')
benchmark = data.get('benchmark_return:發行量加權股價報酬指數')

# create event dataframe
dividend_info = data.get('dividend_announcement')
v = dividend_info[['stock_id', '除權交易日']].set_index(['stock_id', '除權交易日'])
v['value'] = 1
event = v[~v.index.duplicated()].reset_index().drop_duplicates(
    subset=['stock_id', '除權交易日']
).pivot(index='除權交易日', columns='stock_id', values='value').notna()

# calculate factor_data
factor_data = create_factor_data({'pb':factor}, adj_close, event=event)

r = event_study(factor_data, benchmark, adj_close)

plt.bar(r.columns, r.mean().values)
plt.plot(r.columns, r.mean().cumsum().values)

plot_event_study

plot_event_study(returns)

Plot the event study for the given returns.

PARAMETER DESCRIPTION
returns

A DataFrame containing the returns data.

TYPE: DataFrame

Return

ax (matplotlib.axes.Axes): The axes object containing the plot.

事件研究:分析特定事件對股價的影響。

使用說明

事件研究用於分析特定事件(如財報公告、法說會、股利發放)對股價的短期與長期影響。


factor_analysis

finlab.tools.factor_analysis

Factor analysis toolkit -- public facade.

All public symbols are re-exported from focused submodules so that existing from finlab.tools.factor_analysis import ... imports keep working unchanged.

Submodules

  • :mod:finlab.tools.factor_metrics -- IC, correlation, NDCG scoring
  • :mod:finlab.tools.factor_returns -- boolean factor returns & Shapley values
  • :mod:finlab.tools.factor_centrality -- PCA-based rolling centrality
  • :mod:finlab.tools.factor_regression -- OLS trend analysis

calc_centrality

calc_centrality(return_df, window_periods, n_components=1)

Compute rolling PCA centrality over return_df.

PARAMETER DESCRIPTION
return_df

Time-series DataFrame (dates x assets/factors).

TYPE: DataFrame

window_periods

Rolling window length in rows.

TYPE: int

n_components

Number of PCA components.

TYPE: int DEFAULT: 1

RETURNS DESCRIPTION
DataFrame

DataFrame of rolling centrality scores.

calc_factor_return

calc_factor_return(features, labels)

Compute equal-weight portfolio returns per boolean factor.

Each column in features must be boolean. For every date the mean label value across selected (True) stocks is returned.

PARAMETER DESCRIPTION
features

Boolean feature DataFrame (MultiIndex: date x stock).

TYPE: DataFrame

labels

Excess-return labels with the same MultiIndex.

TYPE: Series

RETURNS DESCRIPTION
DataFrame

DataFrame of per-period returns, one column per factor,

DataFrame

starting from the first fully-populated date.

RAISES DESCRIPTION
ValueError

If any feature column is not boolean.

calc_ic

calc_ic(features, labels, rank=False)

Compute per-date IC between features and labels.

PARAMETER DESCRIPTION
features

MultiIndex DataFrame (date, stock) with factor columns.

TYPE: DataFrame

labels

MultiIndex Series with the same index.

TYPE: Series

rank

If True, rank features before computing correlation.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
DataFrame

DataFrame of IC values per date and factor.

calc_metric

calc_metric(factor, adj_close, days=None, func=corr)

Compute a cross-sectional metric between factor and forward returns.

PARAMETER DESCRIPTION
factor

Single factor DataFrame or dict of named factor DataFrames.

TYPE: DataFrame | dict[str, DataFrame]

adj_close

Adjusted close prices.

TYPE: DataFrame

days

Forward-return horizons (default [10, 20, 60, 120]).

TYPE: list[int] | None DEFAULT: None

func

Scoring callable applied per date group (default :func:corr).

TYPE: Callable[[DataFrame], float] DEFAULT: corr

RETURNS DESCRIPTION
DataFrame

DataFrame with one column per (factor_name, horizon) pair.

calc_regression_stats

calc_regression_stats(df, p_value_threshold=0.05, r_squared_threshold=0.1)

Run per-column OLS regressions and classify each trend.

PARAMETER DESCRIPTION
df

Time-series DataFrame with a DatetimeIndex.

TYPE: DataFrame

p_value_threshold

Significance level for trend classification.

TYPE: float DEFAULT: 0.05

r_squared_threshold

Minimum R-squared for a non-flat trend.

TYPE: float DEFAULT: 0.1

RETURNS DESCRIPTION
DataFrame

DataFrame with columns slope, intercept, r_squared,

DataFrame

p_value, tail_estimate, and trend ("up"/"down"/"flat").

RAISES DESCRIPTION
ValueError

If the DataFrame index is not a DatetimeIndex.

calc_shapley_values

calc_shapley_values(features, labels)

Compute Shapley values measuring each factor's marginal contribution.

Enumerates all 2^n subsets of factors, computes the equal-weight portfolio return of each subset, and distributes the return evenly among the factors in that subset.

PARAMETER DESCRIPTION
features

Boolean feature DataFrame (MultiIndex: date x stock).

TYPE: DataFrame

labels

Excess-return labels with the same MultiIndex.

TYPE: Series

RETURNS DESCRIPTION
DataFrame

DataFrame of Shapley values, one column per factor.

RAISES DESCRIPTION
ValueError

If features is empty, labels lacks a MultiIndex, or any feature column is not boolean.

corr

corr(df)

Pearson correlation between the first two columns of df.

generate_features_and_labels

generate_features_and_labels(dfs, resample)

Build a feature matrix and an excess-return label vector.

Wraps :func:finlab.ml.feature.combine and :func:finlab.ml.label.excess_over_mean.

PARAMETER DESCRIPTION
dfs

Dict mapping factor names to DataFrames (or callables).

TYPE: dict[str, DataFrame | Callable[[], DataFrame]]

resample

Resampling frequency / index passed to the ML helpers.

TYPE: str

RETURNS DESCRIPTION
tuple[DataFrame, Series]

(features, labels) tuple.

ic

ic(factor, adj_close, days=None)

Shorthand for calc_metric(factor, adj_close, days, func=corr).

is_boolean_series

is_boolean_series(series)

Check whether series contains only boolean-like values (including NaN).

Handles the case where older pandas converts bool + NaN to float.

ndcg_k

ndcg_k(k)

Return an NDCG scorer truncated at rank k.

precision_at_rank

precision_at_rank(k)

Return a precision scorer that evaluates the top 1 - k quantile.

因子分析模組,用於評估因子的選股能力與貢獻度。

主要功能

函數 說明 用途
calc_factor_return() 計算因子報酬 評估因子的整體表現
calc_ic() 計算因子 IC(資訊係數) 評估因子與未來報酬的相關性
calc_shapley_values() 計算 Shapley values 評估因子在多因子模型中的邊際貢獻
calc_centrality() 計算因子集中度 評估因子在不同股票上的分布

calc_factor_return()

計算因子報酬,評估因子的選股能力。

使用範例

from finlab import data
from finlab.tools import factor_analysis as fa

# 建立因子:小市值
marketcap = data.get('etl:market_value')
factor = marketcap.rank(pct=True, axis=1) < 0.3

# 計算未來 1 個月報酬(標籤)
close = data.get('price:收盤價')
future_return = close.shift(-20) / close - 1

# 計算因子報酬
factor_return = fa.calc_factor_return(factor, label=future_return)

# 查看結果
print(f"平均因子報酬: {factor_return.mean():.2%}")
print(f"因子報酬標準差: {factor_return.std():.2%}")
print(f"因子夏普率: {factor_return.mean() / factor_return.std():.2f}")

解讀: - 正值:因子有選股能力(選中的股票表現較好) - 負值:因子反向有效(應反向操作) - 接近 0:因子無選股能力

最佳實踐

  • 使用長時間回測驗證因子穩定性
  • 對比不同市場環境(牛市、熊市)的因子表現
  • 結合 IC 分析確認因子有效性

calc_ic()

計算因子 IC(Information Coefficient),衡量因子與未來報酬的相關性。

使用範例

from finlab import data
from finlab.tools import factor_analysis as fa

# 建立因子:營收成長率
revenue = data.get('monthly_revenue:當月營收')
factor = revenue.average(3) / revenue.average(12)  # 近 3 月平均 vs 近 12 月平均

# 計算未來報酬
close = data.get('price:收盤價')
future_return = close.shift(-20) / close - 1

# 計算 IC
ic = fa.calc_ic(factor, label=future_return)

# 查看結果
print(f"平均 IC: {ic.mean():.4f}")
print(f"IC 標準差: {ic.std():.4f}")
print(f"IC_IR (IC 夏普率): {ic.mean() / ic.std():.2f}")

# 視覺化 IC 時間序列
ic.plot(title='營收成長因子 IC')

IC 評估標準

IC 值 評價 說明
> 0.05 優秀 因子有強選股能力
0.03 ~ 0.05 良好 因子有選股能力
0.01 ~ 0.03 一般 因子弱選股能力
< 0.01 無效 因子無選股能力

IC 分析注意事項

  • IC > 0: 因子與未來報酬正相關(做多因子高分股票)
  • IC < 0: 因子與未來報酬負相關(做多因子低分股票)
  • IC 波動大: 因子不穩定,風險高
  • IC_IR > 0.5: 因子風險調整後表現佳

calc_shapley_values()

計算 Shapley values,評估多因子模型中各因子的邊際貢獻。

使用範例

from finlab import data
from finlab.tools import factor_analysis as fa

# 建立多個因子
marketcap = data.get('etl:market_value')
revenue = data.get('monthly_revenue:當月營收')
close = data.get('price:收盤價')

factor1 = marketcap.rank(pct=True, axis=1) < 0.3  # 小市值
factor2 = (revenue.average(3) / revenue.average(12)).rank(pct=True, axis=1) > 0.7  # 營收成長
factor3 = (close / close.shift(20)).rank(pct=True, axis=1) > 0.7  # 動能

# 計算未來報酬
future_return = close.shift(-20) / close - 1

# 計算 Shapley values
factors = [factor1, factor2, factor3]
shapley = fa.calc_shapley_values(factors, label=future_return)

# 查看結果
print("各因子的 Shapley 值:")
for i, value in enumerate(shapley):
    print(f"因子 {i+1}: {value:.4f}")

解讀: - 高 Shapley 值:因子貢獻大,重要性高 - 低 Shapley 值:因子貢獻小,可考慮移除 - 負 Shapley 值:因子有害,拉低整體表現

應用建議

  • 移除 Shapley 值為負或接近 0 的因子
  • 保留 Shapley 值高的因子
  • 重新評估移除因子後的策略表現

calc_centrality()

計算因子集中度,評估因子在不同股票上的分布。

使用範例

from finlab import data
from finlab.tools import factor_analysis as fa

# 建立因子
marketcap = data.get('etl:market_value')
factor = marketcap.rank(pct=True, axis=1) < 0.3

# 計算集中度
centrality = fa.calc_centrality(factor)

print(f"因子集中度: {centrality}")

解讀: - 高集中度:因子集中在少數股票,分散不足 - 低集中度:因子分散在多數股票,分散佳


完整範例:多因子策略分析

from finlab import data
from finlab.tools import factor_analysis as fa
from finlab.backtest import sim

# 步驟 1:建立因子
marketcap = data.get('etl:market_value')
revenue = data.get('monthly_revenue:當月營收')
close = data.get('price:收盤價')

factor1 = marketcap.rank(pct=True, axis=1) < 0.3  # 小市值
factor2 = (revenue.average(3) / revenue.average(12)).rank(pct=True, axis=1) > 0.7  # 營收成長
factor3 = (close / close.shift(20)).rank(pct=True, axis=1) > 0.7  # 動能

# 步驟 2:計算標籤(未來報酬)
future_return = close.shift(-20) / close - 1

# 步驟 3:分析各因子表現
print("=== 因子報酬分析 ===")
for i, factor in enumerate([factor1, factor2, factor3], 1):
    fr = fa.calc_factor_return(factor, label=future_return)
    ic = fa.calc_ic(factor.rank(pct=True, axis=1), label=future_return)
    print(f"\n因子 {i}:")
    print(f"  平均報酬: {fr.mean():.2%}")
    print(f"  平均 IC: {ic.mean():.4f}")

# 步驟 4:計算 Shapley values(貢獻度)
print("\n=== Shapley Values 分析 ===")
shapley = fa.calc_shapley_values([factor1, factor2, factor3], label=future_return)
for i, value in enumerate(shapley, 1):
    print(f"因子 {i} 貢獻度: {value:.4f}")

# 步驟 5:回測策略
position = factor1 & factor2 & factor3
report = sim(position, resample='M')
report.display()

print("\n=== 策略績效 ===")
metrics = report.get_metrics()
print(f"年化報酬: {metrics['annual_return']:.2%}")
print(f"夏普率: {metrics['daily_sharpe']:.2f}")
print(f"最大回撤: {metrics['max_drawdown']:.2%}")

常見問題

Q: 如何判斷一個因子是否有效?

使用三個指標綜合判斷:

# 1. 因子報酬(應顯著 > 0)
factor_return = fa.calc_factor_return(factor, label=future_return)
print(f"平均因子報酬: {factor_return.mean():.2%}")

# 2. IC(應 > 0.02)
ic = fa.calc_ic(factor.rank(pct=True, axis=1), label=future_return)
print(f"平均 IC: {ic.mean():.4f}")

# 3. IC_IR(應 > 0.5)
ic_ir = ic.mean() / ic.std()
print(f"IC_IR: {ic_ir:.2f}")

# 結論:三個指標都合格才認為因子有效

Q: Shapley values 如何應用於策略優化?

# 計算所有因子的 Shapley values
factors = [factor1, factor2, factor3, factor4, factor5]
shapley = fa.calc_shapley_values(factors, label=future_return)

# 移除貢獻度低或負的因子
threshold = 0.001
valid_factors = [f for f, s in zip(factors, shapley) if s > threshold]

print(f"原始因子數: {len(factors)}")
print(f"優化後因子數: {len(valid_factors)}")

# 使用優化後的因子組合
optimized_position = valid_factors[0]
for factor in valid_factors[1:]:
    optimized_position = optimized_position & factor

Q: IC 和因子報酬的區別?

  • IC (Information Coefficient):
  • 衡量因子值與未來報酬的相關性
  • 範圍 [-1, 1],越接近 1 越好
  • 適合評估連續型因子(如市值、ROE)

  • 因子報酬:

  • 衡量因子選出的股票的平均報酬
  • 適合評估二元因子(True/False)
  • 直接反映選股效果
# 連續型因子用 IC
continuous_factor = data.get('price_earning_ratio:本益比')
ic = fa.calc_ic(continuous_factor, label=future_return)

# 二元因子用因子報酬
binary_factor = continuous_factor < continuous_factor.quantile(0.3, axis=1)
factor_return = fa.calc_factor_return(binary_factor, label=future_return)

參考資源