finlab.tools

Utility module providing event study and factor analysis functionality.

Use Cases

Analyze the impact of specific events on stock prices (event study)
Evaluate factor stock selection ability (factor analysis)
Optimize factor combinations in strategies
Understand the sources of strategy performance

Quick Examples

Event Study

from finlab.tools import event_study

# Analyze stock price performance after revenue announcements
# (Requires actual event data)

Factor Analysis

from finlab import data
from finlab.tools import factor_analysis as fa

# Prepare factors and labels
marketcap = data.get('etl:market_value')
revenue = data.get('monthly_revenue:當月營收')

# Build factors
cond1 = marketcap.rank(pct=True, axis=1) < 0.3  # Small cap
cond2 = (revenue.average(3) / revenue.average(12)).rank(pct=True, axis=1) > 0.7  # Revenue growth

# Calculate factor return
factor_return = fa.calc_factor_return(cond1, label=future_return)
print(factor_return)

Detailed Guide

See Factor Analysis Tutorial for: - Complete factor analysis workflow - Factor return calculation - Factor IC analysis - Shapley values contribution analysis

API Reference

event_study()

finlab.tools.event_study

create_factor_data

create_factor_data(factor, adj_close, days=[5, 10, 20, 60], event=None)

create factor data, which contains future return

PARAMETER	DESCRIPTION
`factor`	factor data where index is datetime and columns is asset id TYPE: `DataFrame`
`adj_close`	adj close where index is datetime and columns is asset id TYPE: `DataFrame`
`days`	future return considered TYPE: `List[int]` DEFAULT: `[5, 10, 20, 60]`

Return

Analytic plots and tables

Warning

This function is not identical to finlab.ml.alphalens.create_factor_data

Examples:

現金增減資分析

from finlab.tools.event_study import create_factor_data
from finlab.tools.event_study import event_study

factor = data.get('price_earning_ratio:股價淨值比')
adj_close = data.get('etl:adj_close')
benchmark = data.get('benchmark_return:發行量加權股價報酬指數')

# create event dataframe
dividend_info = data.get('dividend_announcement')
v = dividend_info[['stock_id', '除權交易日']].set_index(['stock_id', '除權交易日'])
v['value'] = 1
event = v[~v.index.duplicated()].reset_index().drop_duplicates(
    subset=['stock_id', '除權交易日']
).pivot(index='除權交易日', columns='stock_id', values='value').notna()

# calculate factor_data
factor_data = create_factor_data({'pb':factor}, adj_close, event=event)

r = event_study(factor_data, benchmark, adj_close)

plt.bar(r.columns, r.mean().values)
plt.plot(r.columns, r.mean().cumsum().values)

event_study

event_study(factor_data, benchmark_adj_close, stock_adj_close, sample_period=(-45, -20), estimation_period=(-5, 20), plot=True)

Run event study and returns the abnormal returns of each stock on each day.

PARAMETER	DESCRIPTION
`factor_data`	factor data where index is datetime and columns is asset id TYPE: `DataFrame`
`benchmark_adj_close`	benchmark for CAPM TYPE: `DataFrame`
`stock_adj_close`	stock price for CAPM TYPE: `DataFrame`
`sample_period`	period for fitting CAPM TYPE: `(int, int)` DEFAULT: `(-45, -20)`
`estimation_period`	period for calculating alpha (abnormal return) TYPE: `(int, int)` DEFAULT: `(-5, 20)`
`plot`	plot the result TYPE: `bool` DEFAULT: `True`

Return

Abnormal returns of each stock on each day.

Examples:

現金增減資分析

from finlab.tools.event_study import create_factor_data
from finlab.tools.event_study import event_study

factor = data.get('price_earning_ratio:股價淨值比')
adj_close = data.get('etl:adj_close')
benchmark = data.get('benchmark_return:發行量加權股價報酬指數')

# create event dataframe
dividend_info = data.get('dividend_announcement')
v = dividend_info[['stock_id', '除權交易日']].set_index(['stock_id', '除權交易日'])
v['value'] = 1
event = v[~v.index.duplicated()].reset_index().drop_duplicates(
    subset=['stock_id', '除權交易日']
).pivot(index='除權交易日', columns='stock_id', values='value').notna()

# calculate factor_data
factor_data = create_factor_data({'pb':factor}, adj_close, event=event)

r = event_study(factor_data, benchmark, adj_close)

plt.bar(r.columns, r.mean().values)
plt.plot(r.columns, r.mean().cumsum().values)

plot_event_study

plot_event_study(returns)

Plot the event study for the given returns.

PARAMETER	DESCRIPTION
`returns`	A DataFrame containing the returns data. TYPE: `DataFrame`

Return

ax (matplotlib.axes.Axes): The axes object containing the plot.

Event Study: Analyze the impact of specific events on stock prices.

Usage Notes

Event study is used to analyze the short-term and long-term impact of specific events (such as earnings announcements, investor conferences, dividend distributions) on stock prices.

factor_analysis

finlab.tools.factor_analysis

calc_centrality

calc_centrality(return_df, window_periods, n_components=1)

對指定的時間序列數據計算滾動資產集中度。

此函式為通用函式，可應用於任何以時間為索引、資產為欄位的 DataFrame(例如因子報酬)。它現在是頻率無關的，滾動窗口由整數 window_periods 指定。

PARAMETER	DESCRIPTION
`return_df`	一個時間序列 DataFrame，索引為日期，欄位為資產(例如因子名稱)。雖然參數名稱為 `return_df`，但此函式設計上可接受任何資產的時間序列數據，例如因子波動率，但就是不能解釋資產集中度了。 TYPE: `DataFrame`
`window_periods`	滾動窗口的長度，以數據點的「數量」計。例如，如果 `return_df` 是月資料， `window_periods=3` 就代表使用 3 個月的滾動窗口。 TYPE: `int`
`n_components`	用於計算的 PCA 主成分數量。 TYPE: `int` DEFAULT: `1`

RETURNS	DESCRIPTION
`DataFrame`	pd.DataFrame: 包含滾動集中度分數的 DataFrame。

Example

date	FactorA	FactorB
2025-01-01	0.1	0.2
2025-01-02	0.1	0.2
2025-01-03	0.1	0.2
2025-01-04	0.1	0.2
2025-01-05	0.1	0.2

calc_factor_return

calc_factor_return(features, labels)

計算基於特徵和標籤的等權重投資組合週期表現。

此函式是因子績效計算的核心引擎，接受預先準備好的特徵和標籤，然後計算每個因子的投資組合報酬。

函式會自動處理以下流程： 1. 驗證所有特徵都是布林值 2. 對每個因子計算等權重投資組合的週期報酬 3. 自動裁剪掉第一個非空行之前的數據

PARAMETER	DESCRIPTION
`features`	特徵 DataFrame，索引為日期，欄位為因子名稱，值應為布林值。 TYPE: `DataFrame`
`labels`	標籤 Series，索引為日期，值為超額報酬。 TYPE: `Series`

RETURNS	DESCRIPTION
`DataFrame`	pd.DataFrame: 一個索引為日期、欄位為各因子名稱的 DataFrame，其值為每個週期的等權重投組表現。輸出會自動從第一個非空行開始，確保數據完整性。

RAISES	DESCRIPTION
`ValueError`	如果特徵不是布林值。

Example

from finlab import data
from finlab.tools.factor_analysis import calc_factor_return, generate_features_and_labels

price = data.get('etl:adj_close')
marketcap = data.get('etl:market_value')
revenue = data.get('monthly_revenue:當月營收')

# 先生成特徵和標籤
features, labels = generate_features_and_labels({
     'marketcap': marketcap.rank(pct=True, axis=1) < 0.3,
     'revenue': (revenue.average(3) / revenue.average(12)).rank(pct=True, axis=1) < 0.3,
     'momentum': price / price.shift(20) - 1 > 0
}, resample=revenue.index)

 # 計算因子報酬
 factor_return = calc_factor_return(features, labels)

 # 輸出範例
 print(factor_return.head())

datetime	marketcap	revenue	momentum
2013-04-30 00:00:00	0.018	-0.005	0.009
2013-05-31 00:00:00	0.004	-0.003	-0.001
2013-06-30 00:00:00	-0.013	-0.006	0.023
2013-07-31 00:00:00	0.007	-0.007	0.001
2013-08-31 00:00:00	0.014	-0.003	-0.005

calc_ic

calc_ic(features, labels, rank=False)

計算特徵與標籤之間的相關係數（IC），可選擇是否對特徵進行排名。

PARAMETER	DESCRIPTION
`features`	特徵資料，索引為MultiIndex（日期, 股票代碼），欄位為因子名稱。 TYPE: `DataFrame`
`labels`	標籤資料，索引為MultiIndex（日期, 股票代碼）。 TYPE: `Series`
`rank`	是否對特徵進行排名。預設為False。 TYPE: `bool` DEFAULT: `False`

RETURNS	DESCRIPTION
	pd.DataFrame: 每個日期、每個因子的IC值。

calc_metric

calc_metric(factor, adj_close, days=[10, 20, 60, 120], func=corr)

計算因子

PARAMETER	DESCRIPTION
`factor`	因子 TYPE: `DataFrame`
`adj_close`	股價 TYPE: `DataFrame`
`days`	預測天數. Defaults to [10, 20, 60, 120]. TYPE: `list` DEFAULT: `[10, 20, 60, 120]`
`func`	計算函數. Defaults to corr. TYPE: `function` DEFAULT: `corr`

RETURNS	DESCRIPTION
	pd.DataFrame: 因子計算結果

Example

factor = data.indicator('RSI')
adj_close = data.get('etl:adj_close')
calc_metric(factor, adj_close)

date	factor_10	factor_20	factor_60	factor_120
2010-01-01	0.1	0.2	0.3	0.4
2010-01-02	0.1	0.2	0.3	0.4
2010-01-03	0.1	0.2	0.3	0.4
2010-01-04	0.1	0.2	0.3	0.4
2010-01-05	0.1	0.2	0.3	0.4

calc_regression_stats

calc_regression_stats(df, p_value_threshold=0.05, r_squared_threshold=0.1)

對 DataFrame 中的每個時間序列進行線性回歸，並回傳原始統計數據。

此函式使用 numpy 進行線性回歸計算，並使用 scipy.stats.t 計算準確的雙尾 p-value。

PARAMETER	DESCRIPTION
`df`	時間序列 DataFrame，索引為 DatetimeIndex，欄位為不同的指標序列。 TYPE: `DataFrame`
`p_value_threshold`	p 值閾值，用於判斷趨勢的統計顯著性。預設為 0.05。 TYPE: `float` DEFAULT: `0.05`
`r_squared_threshold`	R² 閾值，用於判斷趨勢的解釋力。預設為 0.1。 TYPE: `float` DEFAULT: `0.1`

RETURNS	DESCRIPTION
`DataFrame`	pd.DataFrame: 線性回歸的統計結果，包含以下欄位： - slope: 線性回歸斜率 - intercept: 線性回歸截距 - r_squared: 決定係數 (R²) - p_value: 斜率的 p 值（雙尾檢定） - tail_estimate: 時間序列尾部的估計值 - trend: 趨勢分類 ("up", "down", "flat")

Example

# 假設 ic_df 是一個包含 IC 時間序列的 DataFrame
# ic_df = calc_ic(features, labels)

# 1. 計算回歸統計數據
trend_stats = calc_regression_stats(ic_df)
print(trend_stats)

# 2. 基於回傳結果進行客製化分析
# 範例：找出統計顯著 (p-value < 0.05) 且趨勢向上 (slope > 0) 的因子
significant_up_trend = trend_stats[
    (trend_stats['p_value'] < 0.05) & (trend_stats['slope'] > 0)
]
print(significant_up_trend)

# 3. 查看趨勢分類結果
up_trends = trend_stats[trend_stats['trend'] == 'up']
down_trends = trend_stats[trend_stats['trend'] == 'down']
flat_trends = trend_stats[trend_stats['trend'] == 'flat']

calc_shapley_values

calc_shapley_values(features, labels)

計算因子的 Shapley 值，用於評估每個因子對投資組合表現的邊際貢獻。

Shapley 值是一種合作博弈論中的概念，用於公平分配聯盟的總收益給各個參與者。在因子分析中，我們將每個因子視為一個"參與者"，投資組合的報酬視為"聯盟的總收益"。

計算過程： 1. 對所有可能的因子組合（從單一因子到全部因子） 2. 計算每個組合的投資組合報酬 3. 根據 Shapley 值公式計算每個因子的邊際貢獻

PARAMETER	DESCRIPTION
`features`	特徵 DataFrame，索引為日期，欄位為因子名稱，值應為布林值。每個因子代表一個投資策略（True 表示選中該股票）。 TYPE: `DataFrame`
`labels`	標籤 Series，索引為日期，值為超額報酬。應為 MultiIndex，包含 'datetime' 和 'stock_id' 層級。 TYPE: `Series`

RETURNS	DESCRIPTION
`DataFrame`	pd.DataFrame: 包含每個因子 Shapley 值的 DataFrame。索引為日期，欄位為因子名稱，值為該因子的 Shapley 值。

RAISES	DESCRIPTION
`ValueError`	如果 features 為空或沒有欄位如果 labels 的索引不是 MultiIndex 如果 features 不是布林值如果 features 和 labels 的時間索引不匹配

Example

from finlab import data
from finlab.tools.factor_analysis import calc_shapley_values, generate_features_and_labels

price = data.get('etl:adj_close')
marketcap = data.get('etl:market_value')
revenue = data.get('monthly_revenue:當月營收')

# 生成特徵和標籤
features, labels = generate_features_and_labels({
    'marketcap': marketcap.rank(pct=True, axis=1) < 0.3,
    'revenue': (revenue.average(3) / revenue.average(12)).rank(pct=True, axis=1) < 0.3,
    'momentum': price / price.shift(20) - 1 > 0
}, resample=revenue.index)

# 計算 Shapley 值
shapley_df = calc_shapley_values(features, labels)

print(shapley_df.head())

datetime	marketcap	revenue	momentum
2013-04-30 00:00:00	0.012	-0.003	0.006
2013-05-31 00:00:00	0.002	-0.001	-0.001
2013-06-30 00:00:00	-0.008	-0.004	0.015
2013-07-31 00:00:00	0.004	-0.004	0.001
2013-08-31 00:00:00	0.009	-0.002	-0.003

Note

Shapley 值的計算複雜度為 O(2^n)，其中 n 為因子數量
對於大量因子，計算時間可能較長
建議因子數量不超過 10 個以確保合理的計算時間

generate_features_and_labels

generate_features_and_labels(dfs, resample)

生成因子特徵和標籤，這是因子分析的核心步驟。

此函式封裝了因子分析中特徵和標籤生成的標準流程： 1. 使用 finlab.ml.feature.combine 將因子字典轉換為特徵 DataFrame 2. 使用 finlab.ml.label.excess_over_mean 生成超額報酬標籤

PARAMETER DESCRIPTION

dfs

因子字典，包含因子名稱和對應的因子數據或計算函式。

Key (str): 因子名稱，將成為輸出 DataFrame 的欄位名
Value (Union[pd.DataFrame, Callable]): 因子數據或計算函式
- pd.DataFrame: 直接提供的因子數據
- Callable: 計算因子的函式，將被調用以獲取因子數據

此為 finlab.ml.feature.combine 的標準輸入格式。

TYPE: Dict[str, Union[DataFrame, Callable]]

resample

重採樣頻率，用於特徵和標籤的生成。

例如: 'M' (月度), 'Q' (季度), 'Y' (年度)

TYPE: str

RETURNS	DESCRIPTION
`tuple[DataFrame, Series]`	tuple[pd.DataFrame, pd.Series]: - pd.DataFrame: 特徵 DataFrame，索引為日期，欄位為因子名稱 - pd.Series: 標籤 Series，索引為日期，值為超額報酬

RAISES	DESCRIPTION
`ValueError`	如果輸入的因子字典為空或無效。

Example

from finlab import data
from finlab.tools.factor_analysis import generate_features_and_labels

price = data.get('etl:adj_close')
marketcap = data.get('etl:market_value')
revenue = data.get('monthly_revenue:當月營收')

features, labels = generate_features_and_labels({
    'marketcap': marketcap.rank(pct=True, axis=1) < 0.3,
    'revenue': (revenue.average(3) / revenue.average(12)).rank(pct=True, axis=1) < 0.3,
    'momentum': price / price.shift(20) - 1 > 0
}, resample=revenue.index)

print(f"Features shape: {features.shape}")
print(f"Labels shape: {labels.shape}")
Features shape: (120, 3)
Labels shape: (120,)

ic

ic(factor, adj_close, days=[10, 20, 60, 120])

計算因子的IC

PARAMETER	DESCRIPTION
`factor`	因子 TYPE: `DataFrame`
`adj_close`	股價 TYPE: `DataFrame`
`days`	預測天數. Defaults to [10, 20, 60, 120]. TYPE: `list` DEFAULT: `[10, 20, 60, 120]`

RETURNS	DESCRIPTION
	pd.DataFrame: 因子計算結果

Example

factor = data.indicator('RSI')
adj_close = data.get('etl:adj_close')
calc_metric(factor, adj_close)

date	factor_10	factor_20	factor_60	factor_120
2010-01-01	0.1	0.2	0.3	0.4
2010-01-02	0.1	0.2	0.3	0.4
2010-01-03	0.1	0.2	0.3	0.4
2010-01-04	0.1	0.2	0.3	0.4
2010-01-05	0.1	0.2	0.3	0.4

is_boolean_series

is_boolean_series(series)

Check if a pandas Series contains boolean values, handling NaN values.

In older pandas versions, boolean Series with NaN values get converted to float dtype. This function detects such cases by checking if the non-NaN values are boolean.

PARAMETER	DESCRIPTION
`series`	The Series to check TYPE: `Series`

RETURNS	DESCRIPTION
`bool`	True if the Series contains boolean values (including NaN), False otherwise TYPE: `bool`

Example

import pandas as pd
import numpy as np

# Pure boolean
s1 = pd.Series([True, False, True])
is_boolean_series(s1)  # True

# Boolean with NaN (becomes float in old pandas)
s2 = pd.Series([True, False, np.nan])
is_boolean_series(s2)  # True

# Non-boolean
s3 = pd.Series([1, 2, 3])
is_boolean_series(s3)  # False

Factor analysis module for evaluating factor stock selection ability and contribution.

Key Functions:

Function	Description	Purpose
`calc_factor_return()`	Calculate factor return	Evaluate overall factor performance
`calc_ic()`	Calculate factor IC (Information Coefficient)	Evaluate factor correlation with future returns
`calc_shapley_values()`	Calculate Shapley values	Evaluate marginal contribution of each factor in multi-factor models
`calc_centrality()`	Calculate factor concentration	Evaluate factor distribution across stocks

calc_factor_return()

Calculate factor return to evaluate stock selection ability.

Usage Examples:

from finlab import data
from finlab.tools import factor_analysis as fa

# Build factor: small cap
marketcap = data.get('etl:market_value')
factor = marketcap.rank(pct=True, axis=1) < 0.3

# Calculate future 1-month return (label)
close = data.get('price:收盤價')
future_return = close.shift(-20) / close - 1

# Calculate factor return
factor_return = fa.calc_factor_return(factor, label=future_return)

# View results
print(f"Average factor return: {factor_return.mean():.2%}")
print(f"Factor return std dev: {factor_return.std():.2%}")
print(f"Factor Sharpe ratio: {factor_return.mean() / factor_return.std():.2f}")

Interpretation: - Positive: Factor has stock selection ability (selected stocks outperform) - Negative: Factor works in reverse (should inverse the signal) - Near 0: Factor has no stock selection ability

Best Practices

Use long backtest periods to verify factor stability
Compare factor performance across different market environments (bull/bear)
Combine with IC analysis to confirm factor effectiveness

calc_ic()

Calculate factor IC (Information Coefficient), measuring the correlation between factor and future returns.

Usage Examples:

from finlab import data
from finlab.tools import factor_analysis as fa

# Build factor: revenue growth rate
revenue = data.get('monthly_revenue:當月營收')
factor = revenue.average(3) / revenue.average(12)  # 3-month avg vs 12-month avg

# Calculate future return
close = data.get('price:收盤價')
future_return = close.shift(-20) / close - 1

# Calculate IC
ic = fa.calc_ic(factor, label=future_return)

# View results
print(f"Average IC: {ic.mean():.4f}")
print(f"IC std dev: {ic.std():.4f}")
print(f"IC_IR (IC Sharpe ratio): {ic.mean() / ic.std():.2f}")

# Visualize IC time series
ic.plot(title='Revenue Growth Factor IC')

IC Evaluation Standards:

IC Value	Rating	Description
> 0.05	Excellent	Factor has strong stock selection ability
0.03 ~ 0.05	Good	Factor has stock selection ability
0.01 ~ 0.03	Average	Factor has weak stock selection ability
< 0.01	Ineffective	Factor has no stock selection ability

IC Analysis Notes

IC > 0: Factor positively correlates with future returns (go long on high-scoring stocks)
IC < 0: Factor negatively correlates with future returns (go long on low-scoring stocks)
High IC variance: Factor is unstable, higher risk
IC_IR > 0.5: Factor performs well on a risk-adjusted basis

calc_shapley_values()

Calculate Shapley values to evaluate the marginal contribution of each factor in a multi-factor model.

Usage Examples:

from finlab import data
from finlab.tools import factor_analysis as fa

# Build multiple factors
marketcap = data.get('etl:market_value')
revenue = data.get('monthly_revenue:當月營收')
close = data.get('price:收盤價')

factor1 = marketcap.rank(pct=True, axis=1) < 0.3  # Small cap
factor2 = (revenue.average(3) / revenue.average(12)).rank(pct=True, axis=1) > 0.7  # Revenue growth
factor3 = (close / close.shift(20)).rank(pct=True, axis=1) > 0.7  # Momentum

# Calculate future return
future_return = close.shift(-20) / close - 1

# Calculate Shapley values
factors = [factor1, factor2, factor3]
shapley = fa.calc_shapley_values(factors, label=future_return)

# View results
print("Shapley values for each factor:")
for i, value in enumerate(shapley):
    print(f"Factor {i+1}: {value:.4f}")

Interpretation: - High Shapley value: Factor contributes significantly, high importance - Low Shapley value: Factor contributes little, consider removing - Negative Shapley value: Factor is harmful, drags down overall performance

Application Recommendations

Remove factors with negative or near-zero Shapley values
Retain factors with high Shapley values
Re-evaluate strategy performance after removing factors

calc_centrality()

Calculate factor concentration, evaluating factor distribution across stocks.

Usage Examples:

from finlab import data
from finlab.tools import factor_analysis as fa

# Build factor
marketcap = data.get('etl:market_value')
factor = marketcap.rank(pct=True, axis=1) < 0.3

# Calculate concentration
centrality = fa.calc_centrality(factor)

print(f"Factor concentration: {centrality}")

Interpretation: - High concentration: Factor is concentrated in few stocks, insufficient diversification - Low concentration: Factor is distributed across many stocks, good diversification

Complete Example: Multi-Factor Strategy Analysis

from finlab import data
from finlab.tools import factor_analysis as fa
from finlab.backtest import sim

# Step 1: Build factors
marketcap = data.get('etl:market_value')
revenue = data.get('monthly_revenue:當月營收')
close = data.get('price:收盤價')

factor1 = marketcap.rank(pct=True, axis=1) < 0.3  # Small cap
factor2 = (revenue.average(3) / revenue.average(12)).rank(pct=True, axis=1) > 0.7  # Revenue growth
factor3 = (close / close.shift(20)).rank(pct=True, axis=1) > 0.7  # Momentum

# Step 2: Calculate labels (future returns)
future_return = close.shift(-20) / close - 1

# Step 3: Analyze each factor
print("=== Factor Return Analysis ===")
for i, factor in enumerate([factor1, factor2, factor3], 1):
    fr = fa.calc_factor_return(factor, label=future_return)
    ic = fa.calc_ic(factor.rank(pct=True, axis=1), label=future_return)
    print(f"\nFactor {i}:")
    print(f"  Average return: {fr.mean():.2%}")
    print(f"  Average IC: {ic.mean():.4f}")

# Step 4: Calculate Shapley values (contribution)
print("\n=== Shapley Values Analysis ===")
shapley = fa.calc_shapley_values([factor1, factor2, factor3], label=future_return)
for i, value in enumerate(shapley, 1):
    print(f"Factor {i} contribution: {value:.4f}")

# Step 5: Backtest strategy
position = factor1 & factor2 & factor3
report = sim(position, resample='M')
report.display()

print("\n=== Strategy Performance ===")
metrics = report.get_metrics()
print(f"Annual return: {metrics['annual_return']:.2%}")
print(f"Sharpe ratio: {metrics['daily_sharpe']:.2f}")
print(f"Max drawdown: {metrics['max_drawdown']:.2%}")

FAQ

Q: How do I determine if a factor is effective?

Use three metrics for comprehensive evaluation:

# 1. Factor return (should be significantly > 0)
factor_return = fa.calc_factor_return(factor, label=future_return)
print(f"Average factor return: {factor_return.mean():.2%}")

# 2. IC (should be > 0.02)
ic = fa.calc_ic(factor.rank(pct=True, axis=1), label=future_return)
print(f"Average IC: {ic.mean():.4f}")

# 3. IC_IR (should be > 0.5)
ic_ir = ic.mean() / ic.std()
print(f"IC_IR: {ic_ir:.2f}")

# Conclusion: Factor is effective only when all three metrics pass

Q: How do I apply Shapley values to strategy optimization?

# Calculate Shapley values for all factors
factors = [factor1, factor2, factor3, factor4, factor5]
shapley = fa.calc_shapley_values(factors, label=future_return)

# Remove factors with low or negative contribution
threshold = 0.001
valid_factors = [f for f, s in zip(factors, shapley) if s > threshold]

print(f"Original factor count: {len(factors)}")
print(f"Optimized factor count: {len(valid_factors)}")

# Use optimized factor combination
optimized_position = valid_factors[0]
for factor in valid_factors[1:]:
    optimized_position = optimized_position & factor

Q: What is the difference between IC and factor return?

IC (Information Coefficient):
Measures the correlation between factor values and future returns
Range [-1, 1], closer to 1 is better
Suitable for evaluating continuous factors (e.g., market cap, ROE)
Factor Return:
Measures the average return of stocks selected by the factor
Suitable for evaluating binary factors (True/False)
Directly reflects stock selection effectiveness

# Use IC for continuous factors
continuous_factor = data.get('price_earning_ratio:本益比')
ic = fa.calc_ic(continuous_factor, label=future_return)

# Use factor return for binary factors
binary_factor = continuous_factor < continuous_factor.quantile(0.3, axis=1)
factor_return = fa.calc_factor_return(binary_factor, label=future_return)

Resources

Factor Analysis Tutorial - End-to-end examples
Complete Strategy Development Workflow - Includes factor analysis steps
Machine Learning Strategies - Use ML to automatically discover factors