finlab.tools

工具模組，提供事件研究與因子分析功能。

使用情境

分析特定事件對股價的影響（事件研究）
評估因子的選股能力（因子分析）
優化策略的因子組合
理解策略績效的來源

快速範例

事件研究

from finlab.tools import event_study

# 分析營收公告後的股價表現
# （待補充：需要實際的事件資料）

因子分析

from finlab import data
from finlab.tools import factor_analysis as fa

# 準備因子與標籤
marketcap = data.get('etl:market_value')
revenue = data.get('monthly_revenue:當月營收')

# 建立因子
cond1 = marketcap.rank(pct=True, axis=1) < 0.3  # 小市值
cond2 = (revenue.average(3) / revenue.average(12)).rank(pct=True, axis=1) > 0.7  # 營收成長

# 計算因子報酬
factor_return = fa.calc_factor_return(cond1, label=future_return)
print(factor_return)

詳細教學

參考因子分析教學，了解： - 完整因子分析流程 - 因子報酬計算 - 因子 IC 分析 - Shapley values 貢獻度分析

API Reference

event_study()

finlab.tools.event_study

create_factor_data

create_factor_data(factor, adj_close, days=None, event=None)

create factor data, which contains future return

PARAMETER	DESCRIPTION
`factor`	factor data where index is datetime and columns is asset id TYPE: `DataFrame`
`adj_close`	adj close where index is datetime and columns is asset id TYPE: `DataFrame`
`days`	future return considered TYPE: `list[int]` DEFAULT: `None`

Return

Analytic plots and tables

Warning

This function is not identical to finlab.ml.alphalens.create_factor_data

Examples:

現金增減資分析

from finlab.tools.event_study import create_factor_data
from finlab.tools.event_study import event_study

factor = data.get('price_earning_ratio:股價淨值比')
adj_close = data.get('etl:adj_close')
benchmark = data.get('benchmark_return:發行量加權股價報酬指數')

# create event dataframe
dividend_info = data.get('dividend_announcement')
v = dividend_info[['stock_id', '除權交易日']].set_index(['stock_id', '除權交易日'])
v['value'] = 1
event = v[~v.index.duplicated()].reset_index().drop_duplicates(
    subset=['stock_id', '除權交易日']
).pivot(index='除權交易日', columns='stock_id', values='value').notna()

# calculate factor_data
factor_data = create_factor_data({'pb':factor}, adj_close, event=event)

r = event_study(factor_data, benchmark, adj_close)

plt.bar(r.columns, r.mean().values)
plt.plot(r.columns, r.mean().cumsum().values)

event_study

event_study(factor_data, benchmark_adj_close, stock_adj_close, sample_period=(-45, -20), estimation_period=(-5, 20), plot=True)

Run event study and returns the abnormal returns of each stock on each day.

PARAMETER	DESCRIPTION
`factor_data`	factor data where index is datetime and columns is asset id TYPE: `DataFrame`
`benchmark_adj_close`	benchmark for CAPM TYPE: `DataFrame`
`stock_adj_close`	stock price for CAPM TYPE: `DataFrame`
`sample_period`	period for fitting CAPM TYPE: `(int, int)` DEFAULT: `(-45, -20)`
`estimation_period`	period for calculating alpha (abnormal return) TYPE: `(int, int)` DEFAULT: `(-5, 20)`
`plot`	plot the result TYPE: `bool` DEFAULT: `True`

Return

Abnormal returns of each stock on each day.

Examples:

現金增減資分析

from finlab.tools.event_study import create_factor_data
from finlab.tools.event_study import event_study

factor = data.get('price_earning_ratio:股價淨值比')
adj_close = data.get('etl:adj_close')
benchmark = data.get('benchmark_return:發行量加權股價報酬指數')

# create event dataframe
dividend_info = data.get('dividend_announcement')
v = dividend_info[['stock_id', '除權交易日']].set_index(['stock_id', '除權交易日'])
v['value'] = 1
event = v[~v.index.duplicated()].reset_index().drop_duplicates(
    subset=['stock_id', '除權交易日']
).pivot(index='除權交易日', columns='stock_id', values='value').notna()

# calculate factor_data
factor_data = create_factor_data({'pb':factor}, adj_close, event=event)

r = event_study(factor_data, benchmark, adj_close)

plt.bar(r.columns, r.mean().values)
plt.plot(r.columns, r.mean().cumsum().values)

plot_event_study

plot_event_study(returns)

Plot the event study for the given returns.

PARAMETER	DESCRIPTION
`returns`	A DataFrame containing the returns data. TYPE: `DataFrame`

Return

ax (matplotlib.axes.Axes): The axes object containing the plot.

事件研究：分析特定事件對股價的影響。

使用說明

事件研究用於分析特定事件（如財報公告、法說會、股利發放）對股價的短期與長期影響。

factor_analysis

finlab.tools.factor_analysis

Factor analysis toolkit -- public facade.

All public symbols are re-exported from focused submodules so that existing from finlab.tools.factor_analysis import ... imports keep working unchanged.

Submodules

:mod:finlab.tools.factor_metrics -- IC, correlation, NDCG scoring
:mod:finlab.tools.factor_returns -- boolean factor returns & Shapley values
:mod:finlab.tools.factor_centrality -- PCA-based rolling centrality
:mod:finlab.tools.factor_regression -- OLS trend analysis

calc_centrality

calc_centrality(return_df, window_periods, n_components=1)

Compute rolling PCA centrality over return_df.

PARAMETER	DESCRIPTION
`return_df`	Time-series DataFrame (dates x assets/factors). TYPE: `DataFrame`
`window_periods`	Rolling window length in rows. TYPE: `int`
`n_components`	Number of PCA components. TYPE: `int` DEFAULT: `1`

RETURNS	DESCRIPTION
`DataFrame`	DataFrame of rolling centrality scores.

calc_factor_return

calc_factor_return(features, labels)

Compute equal-weight portfolio returns per boolean factor.

Each column in features must be boolean. For every date the mean label value across selected (True) stocks is returned.

PARAMETER	DESCRIPTION
`features`	Boolean feature DataFrame (MultiIndex: date x stock). TYPE: `DataFrame`
`labels`	Excess-return labels with the same MultiIndex. TYPE: `Series`

RETURNS	DESCRIPTION
`DataFrame`	DataFrame of per-period returns, one column per factor,
`DataFrame`	starting from the first fully-populated date.

RAISES	DESCRIPTION
`ValueError`	If any feature column is not boolean.

calc_ic

calc_ic(features, labels, rank=False)

Compute per-date IC between features and labels.

PARAMETER	DESCRIPTION
`features`	MultiIndex DataFrame (date, stock) with factor columns. TYPE: `DataFrame`
`labels`	MultiIndex Series with the same index. TYPE: `Series`
`rank`	If `True`, rank features before computing correlation. TYPE: `bool` DEFAULT: `False`

RETURNS	DESCRIPTION
`DataFrame`	DataFrame of IC values per date and factor.

calc_metric

calc_metric(factor, adj_close, days=None, func=corr)

Compute a cross-sectional metric between factor and forward returns.

PARAMETER	DESCRIPTION
`factor`	Single factor DataFrame or dict of named factor DataFrames. TYPE: `DataFrame \| dict[str, DataFrame]`
`adj_close`	Adjusted close prices. TYPE: `DataFrame`
`days`	Forward-return horizons (default `[10, 20, 60, 120]`). TYPE: `list[int] \| None` DEFAULT: `None`
`func`	Scoring callable applied per date group (default :func:`corr`). TYPE: `Callable[[DataFrame], float]` DEFAULT: `corr`

RETURNS	DESCRIPTION
`DataFrame`	DataFrame with one column per `(factor_name, horizon)` pair.

calc_regression_stats

calc_regression_stats(df, p_value_threshold=0.05, r_squared_threshold=0.1)

Run per-column OLS regressions and classify each trend.

PARAMETER	DESCRIPTION
`df`	Time-series DataFrame with a `DatetimeIndex`. TYPE: `DataFrame`
`p_value_threshold`	Significance level for trend classification. TYPE: `float` DEFAULT: `0.05`
`r_squared_threshold`	Minimum R-squared for a non-flat trend. TYPE: `float` DEFAULT: `0.1`

RETURNS	DESCRIPTION
`DataFrame`	DataFrame with columns `slope`, `intercept`, `r_squared`,
`DataFrame`	`p_value`, `tail_estimate`, and `trend` (`"up"`/`"down"`/`"flat"`).

RAISES	DESCRIPTION
`ValueError`	If the DataFrame index is not a `DatetimeIndex`.

calc_shapley_values

calc_shapley_values(features, labels)

Compute Shapley values measuring each factor's marginal contribution.

Enumerates all 2^n subsets of factors, computes the equal-weight portfolio return of each subset, and distributes the return evenly among the factors in that subset.

PARAMETER	DESCRIPTION
`features`	Boolean feature DataFrame (MultiIndex: date x stock). TYPE: `DataFrame`
`labels`	Excess-return labels with the same MultiIndex. TYPE: `Series`

RETURNS	DESCRIPTION
`DataFrame`	DataFrame of Shapley values, one column per factor.

RAISES	DESCRIPTION
`ValueError`	If features is empty, labels lacks a MultiIndex, or any feature column is not boolean.

corr

corr(df)

Pearson correlation between the first two columns of df.

generate_features_and_labels

generate_features_and_labels(dfs, resample)

Build a feature matrix and an excess-return label vector.

Wraps :func:finlab.ml.feature.combine and :func:finlab.ml.label.excess_over_mean.

PARAMETER	DESCRIPTION
`dfs`	Dict mapping factor names to DataFrames (or callables). TYPE: `dict[str, DataFrame \| Callable[[], DataFrame]]`
`resample`	Resampling frequency / index passed to the ML helpers. TYPE: `str`

RETURNS	DESCRIPTION
`tuple[DataFrame, Series]`	`(features, labels)` tuple.

ic

ic(factor, adj_close, days=None)

Shorthand for calc_metric(factor, adj_close, days, func=corr).

is_boolean_series

is_boolean_series(series)

Check whether series contains only boolean-like values (including NaN).

Handles the case where older pandas converts bool + NaN to float.

ndcg_k

ndcg_k(k)

Return an NDCG scorer truncated at rank k.

precision_at_rank

precision_at_rank(k)

Return a precision scorer that evaluates the top 1 - k quantile.

因子分析模組，用於評估因子的選股能力與貢獻度。

主要功能：

函數	說明	用途
`calc_factor_return()`	計算因子報酬	評估因子的整體表現
`calc_ic()`	計算因子 IC（資訊係數）	評估因子與未來報酬的相關性
`calc_shapley_values()`	計算 Shapley values	評估因子在多因子模型中的邊際貢獻
`calc_centrality()`	計算因子集中度	評估因子在不同股票上的分布

calc_factor_return()

計算因子報酬，評估因子的選股能力。

使用範例：

from finlab import data
from finlab.tools import factor_analysis as fa

# 建立因子：小市值
marketcap = data.get('etl:market_value')
factor = marketcap.rank(pct=True, axis=1) < 0.3

# 計算未來 1 個月報酬（標籤）
close = data.get('price:收盤價')
future_return = close.shift(-20) / close - 1

# 計算因子報酬
factor_return = fa.calc_factor_return(factor, label=future_return)

# 查看結果
print(f"平均因子報酬: {factor_return.mean():.2%}")
print(f"因子報酬標準差: {factor_return.std():.2%}")
print(f"因子夏普率: {factor_return.mean() / factor_return.std():.2f}")

解讀： - 正值：因子有選股能力（選中的股票表現較好） - 負值：因子反向有效（應反向操作） - 接近 0：因子無選股能力

最佳實踐

使用長時間回測驗證因子穩定性
對比不同市場環境（牛市、熊市）的因子表現
結合 IC 分析確認因子有效性

calc_ic()

計算因子 IC（Information Coefficient），衡量因子與未來報酬的相關性。

使用範例：

from finlab import data
from finlab.tools import factor_analysis as fa

# 建立因子：營收成長率
revenue = data.get('monthly_revenue:當月營收')
factor = revenue.average(3) / revenue.average(12)  # 近 3 月平均 vs 近 12 月平均

# 計算未來報酬
close = data.get('price:收盤價')
future_return = close.shift(-20) / close - 1

# 計算 IC
ic = fa.calc_ic(factor, label=future_return)

# 查看結果
print(f"平均 IC: {ic.mean():.4f}")
print(f"IC 標準差: {ic.std():.4f}")
print(f"IC_IR (IC 夏普率): {ic.mean() / ic.std():.2f}")

# 視覺化 IC 時間序列
ic.plot(title='營收成長因子 IC')

IC 評估標準：

IC 值	評價	說明
> 0.05	優秀	因子有強選股能力
0.03 ~ 0.05	良好	因子有選股能力
0.01 ~ 0.03	一般	因子弱選股能力
< 0.01	無效	因子無選股能力

IC 分析注意事項

IC > 0: 因子與未來報酬正相關（做多因子高分股票）
IC < 0: 因子與未來報酬負相關（做多因子低分股票）
IC 波動大: 因子不穩定，風險高
IC_IR > 0.5: 因子風險調整後表現佳

calc_shapley_values()

計算 Shapley values，評估多因子模型中各因子的邊際貢獻。

使用範例：

from finlab import data
from finlab.tools import factor_analysis as fa

# 建立多個因子
marketcap = data.get('etl:market_value')
revenue = data.get('monthly_revenue:當月營收')
close = data.get('price:收盤價')

factor1 = marketcap.rank(pct=True, axis=1) < 0.3  # 小市值
factor2 = (revenue.average(3) / revenue.average(12)).rank(pct=True, axis=1) > 0.7  # 營收成長
factor3 = (close / close.shift(20)).rank(pct=True, axis=1) > 0.7  # 動能

# 計算未來報酬
future_return = close.shift(-20) / close - 1

# 計算 Shapley values
factors = [factor1, factor2, factor3]
shapley = fa.calc_shapley_values(factors, label=future_return)

# 查看結果
print("各因子的 Shapley 值:")
for i, value in enumerate(shapley):
    print(f"因子 {i+1}: {value:.4f}")

解讀： - 高 Shapley 值：因子貢獻大，重要性高 - 低 Shapley 值：因子貢獻小，可考慮移除 - 負 Shapley 值：因子有害，拉低整體表現

應用建議

移除 Shapley 值為負或接近 0 的因子
保留 Shapley 值高的因子
重新評估移除因子後的策略表現

calc_centrality()

計算因子集中度，評估因子在不同股票上的分布。

使用範例：

from finlab import data
from finlab.tools import factor_analysis as fa

# 建立因子
marketcap = data.get('etl:market_value')
factor = marketcap.rank(pct=True, axis=1) < 0.3

# 計算集中度
centrality = fa.calc_centrality(factor)

print(f"因子集中度: {centrality}")

解讀： - 高集中度：因子集中在少數股票，分散不足 - 低集中度：因子分散在多數股票，分散佳

完整範例：多因子策略分析

from finlab import data
from finlab.tools import factor_analysis as fa
from finlab.backtest import sim

# 步驟 1：建立因子
marketcap = data.get('etl:market_value')
revenue = data.get('monthly_revenue:當月營收')
close = data.get('price:收盤價')

factor1 = marketcap.rank(pct=True, axis=1) < 0.3  # 小市值
factor2 = (revenue.average(3) / revenue.average(12)).rank(pct=True, axis=1) > 0.7  # 營收成長
factor3 = (close / close.shift(20)).rank(pct=True, axis=1) > 0.7  # 動能

# 步驟 2：計算標籤（未來報酬）
future_return = close.shift(-20) / close - 1

# 步驟 3：分析各因子表現
print("=== 因子報酬分析 ===")
for i, factor in enumerate([factor1, factor2, factor3], 1):
    fr = fa.calc_factor_return(factor, label=future_return)
    ic = fa.calc_ic(factor.rank(pct=True, axis=1), label=future_return)
    print(f"\n因子 {i}:")
    print(f"  平均報酬: {fr.mean():.2%}")
    print(f"  平均 IC: {ic.mean():.4f}")

# 步驟 4：計算 Shapley values（貢獻度）
print("\n=== Shapley Values 分析 ===")
shapley = fa.calc_shapley_values([factor1, factor2, factor3], label=future_return)
for i, value in enumerate(shapley, 1):
    print(f"因子 {i} 貢獻度: {value:.4f}")

# 步驟 5：回測策略
position = factor1 & factor2 & factor3
report = sim(position, resample='M')
report.display()

print("\n=== 策略績效 ===")
metrics = report.get_metrics()
print(f"年化報酬: {metrics['annual_return']:.2%}")
print(f"夏普率: {metrics['daily_sharpe']:.2f}")
print(f"最大回撤: {metrics['max_drawdown']:.2%}")

常見問題

Q: 如何判斷一個因子是否有效？

使用三個指標綜合判斷：

# 1. 因子報酬（應顯著 > 0）
factor_return = fa.calc_factor_return(factor, label=future_return)
print(f"平均因子報酬: {factor_return.mean():.2%}")

# 2. IC（應 > 0.02）
ic = fa.calc_ic(factor.rank(pct=True, axis=1), label=future_return)
print(f"平均 IC: {ic.mean():.4f}")

# 3. IC_IR（應 > 0.5）
ic_ir = ic.mean() / ic.std()
print(f"IC_IR: {ic_ir:.2f}")

# 結論：三個指標都合格才認為因子有效

Q: Shapley values 如何應用於策略優化？

# 計算所有因子的 Shapley values
factors = [factor1, factor2, factor3, factor4, factor5]
shapley = fa.calc_shapley_values(factors, label=future_return)

# 移除貢獻度低或負的因子
threshold = 0.001
valid_factors = [f for f, s in zip(factors, shapley) if s > threshold]

print(f"原始因子數: {len(factors)}")
print(f"優化後因子數: {len(valid_factors)}")

# 使用優化後的因子組合
optimized_position = valid_factors[0]
for factor in valid_factors[1:]:
    optimized_position = optimized_position & factor

Q: IC 和因子報酬的區別？

IC (Information Coefficient):
衡量因子值與未來報酬的相關性
範圍 [-1, 1]，越接近 1 越好
適合評估連續型因子（如市值、ROE）
因子報酬:
衡量因子選出的股票的平均報酬
適合評估二元因子（True/False）
直接反映選股效果

# 連續型因子用 IC
continuous_factor = data.get('price_earning_ratio:本益比')
ic = fa.calc_ic(continuous_factor, label=future_return)

# 二元因子用因子報酬
binary_factor = continuous_factor < continuous_factor.quantile(0.3, axis=1)
factor_return = fa.calc_factor_return(binary_factor, label=future_return)

參考資源

因子分析完整教學 - 端到端範例
完整策略開發流程 - 包含因子分析步驟
機器學習策略 - 使用 ML 自動挖掘因子