finlab.tools
Utility module providing event study and factor analysis functionality.
Use Cases
- Analyze the impact of specific events on stock prices (event study)
- Evaluate factor stock selection ability (factor analysis)
- Optimize factor combinations in strategies
- Understand the sources of strategy performance
Quick Examples
Event Study
from finlab.tools import event_study
# Analyze stock price performance after revenue announcements
# (Requires actual event data)
Factor Analysis
from finlab import data
from finlab.tools import factor_analysis as fa
# Prepare factors and labels
marketcap = data.get('etl:market_value')
revenue = data.get('monthly_revenue:當月營收')
# Build factors
cond1 = marketcap.rank(pct=True, axis=1) < 0.3 # Small cap
cond2 = (revenue.average(3) / revenue.average(12)).rank(pct=True, axis=1) > 0.7 # Revenue growth
# Calculate factor return
factor_return = fa.calc_factor_return(cond1, label=future_return)
print(factor_return)
Detailed Guide
See Factor Analysis Tutorial for: - Complete factor analysis workflow - Factor return calculation - Factor IC analysis - Shapley values contribution analysis
API Reference
event_study()
finlab.tools.event_study
create_factor_data
create factor data, which contains future return
| PARAMETER | DESCRIPTION |
|---|---|
factor
|
factor data where index is datetime and columns is asset id
TYPE:
|
adj_close
|
adj close where index is datetime and columns is asset id
TYPE:
|
days
|
future return considered
TYPE:
|
Return
Analytic plots and tables
Warning
This function is not identical to finlab.ml.alphalens.create_factor_data
Examples:
from finlab.tools.event_study import create_factor_data
from finlab.tools.event_study import event_study
factor = data.get('price_earning_ratio:股價淨值比')
adj_close = data.get('etl:adj_close')
benchmark = data.get('benchmark_return:發行量加權股價報酬指數')
# create event dataframe
dividend_info = data.get('dividend_announcement')
v = dividend_info[['stock_id', '除權交易日']].set_index(['stock_id', '除權交易日'])
v['value'] = 1
event = v[~v.index.duplicated()].reset_index().drop_duplicates(
subset=['stock_id', '除權交易日']
).pivot(index='除權交易日', columns='stock_id', values='value').notna()
# calculate factor_data
factor_data = create_factor_data({'pb':factor}, adj_close, event=event)
r = event_study(factor_data, benchmark, adj_close)
plt.bar(r.columns, r.mean().values)
plt.plot(r.columns, r.mean().cumsum().values)
event_study
event_study(factor_data, benchmark_adj_close, stock_adj_close, sample_period=(-45, -20), estimation_period=(-5, 20), plot=True)
Run event study and returns the abnormal returns of each stock on each day.
| PARAMETER | DESCRIPTION |
|---|---|
factor_data
|
factor data where index is datetime and columns is asset id
TYPE:
|
benchmark_adj_close
|
benchmark for CAPM
TYPE:
|
stock_adj_close
|
stock price for CAPM
TYPE:
|
sample_period
|
period for fitting CAPM
TYPE:
|
estimation_period
|
period for calculating alpha (abnormal return)
TYPE:
|
plot
|
plot the result
TYPE:
|
Return
Abnormal returns of each stock on each day.
Examples:
from finlab.tools.event_study import create_factor_data
from finlab.tools.event_study import event_study
factor = data.get('price_earning_ratio:股價淨值比')
adj_close = data.get('etl:adj_close')
benchmark = data.get('benchmark_return:發行量加權股價報酬指數')
# create event dataframe
dividend_info = data.get('dividend_announcement')
v = dividend_info[['stock_id', '除權交易日']].set_index(['stock_id', '除權交易日'])
v['value'] = 1
event = v[~v.index.duplicated()].reset_index().drop_duplicates(
subset=['stock_id', '除權交易日']
).pivot(index='除權交易日', columns='stock_id', values='value').notna()
# calculate factor_data
factor_data = create_factor_data({'pb':factor}, adj_close, event=event)
r = event_study(factor_data, benchmark, adj_close)
plt.bar(r.columns, r.mean().values)
plt.plot(r.columns, r.mean().cumsum().values)
Event Study: Analyze the impact of specific events on stock prices.
Usage Notes
Event study is used to analyze the short-term and long-term impact of specific events (such as earnings announcements, investor conferences, dividend distributions) on stock prices.
factor_analysis
finlab.tools.factor_analysis
Factor analysis toolkit -- public facade.
All public symbols are re-exported from focused submodules so that
existing from finlab.tools.factor_analysis import ... imports keep
working unchanged.
Submodules
- :mod:
finlab.tools.factor_metrics-- IC, correlation, NDCG scoring - :mod:
finlab.tools.factor_returns-- boolean factor returns & Shapley values - :mod:
finlab.tools.factor_centrality-- PCA-based rolling centrality - :mod:
finlab.tools.factor_regression-- OLS trend analysis
calc_centrality
Compute rolling PCA centrality over return_df.
| PARAMETER | DESCRIPTION |
|---|---|
return_df
|
Time-series DataFrame (dates x assets/factors).
TYPE:
|
window_periods
|
Rolling window length in rows.
TYPE:
|
n_components
|
Number of PCA components.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
DataFrame of rolling centrality scores. |
calc_factor_return
Compute equal-weight portfolio returns per boolean factor.
Each column in features must be boolean. For every date the mean
label value across selected (True) stocks is returned.
| PARAMETER | DESCRIPTION |
|---|---|
features
|
Boolean feature DataFrame (MultiIndex: date x stock).
TYPE:
|
labels
|
Excess-return labels with the same MultiIndex.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
DataFrame of per-period returns, one column per factor, |
DataFrame
|
starting from the first fully-populated date. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If any feature column is not boolean. |
calc_ic
Compute per-date IC between features and labels.
| PARAMETER | DESCRIPTION |
|---|---|
features
|
MultiIndex DataFrame (date, stock) with factor columns.
TYPE:
|
labels
|
MultiIndex Series with the same index.
TYPE:
|
rank
|
If
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
DataFrame of IC values per date and factor. |
calc_metric
Compute a cross-sectional metric between factor and forward returns.
| PARAMETER | DESCRIPTION |
|---|---|
factor
|
Single factor DataFrame or dict of named factor DataFrames.
TYPE:
|
adj_close
|
Adjusted close prices.
TYPE:
|
days
|
Forward-return horizons (default
TYPE:
|
func
|
Scoring callable applied per date group (default :func:
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
DataFrame with one column per |
calc_regression_stats
Run per-column OLS regressions and classify each trend.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Time-series DataFrame with a
TYPE:
|
p_value_threshold
|
Significance level for trend classification.
TYPE:
|
r_squared_threshold
|
Minimum R-squared for a non-flat trend.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
DataFrame with columns |
DataFrame
|
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If the DataFrame index is not a |
calc_shapley_values
Compute Shapley values measuring each factor's marginal contribution.
Enumerates all 2^n subsets of factors, computes the equal-weight portfolio return of each subset, and distributes the return evenly among the factors in that subset.
| PARAMETER | DESCRIPTION |
|---|---|
features
|
Boolean feature DataFrame (MultiIndex: date x stock).
TYPE:
|
labels
|
Excess-return labels with the same MultiIndex.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
DataFrame of Shapley values, one column per factor. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If features is empty, labels lacks a MultiIndex, or any feature column is not boolean. |
generate_features_and_labels
Build a feature matrix and an excess-return label vector.
Wraps :func:finlab.ml.feature.combine and
:func:finlab.ml.label.excess_over_mean.
| PARAMETER | DESCRIPTION |
|---|---|
dfs
|
Dict mapping factor names to DataFrames (or callables).
TYPE:
|
resample
|
Resampling frequency / index passed to the ML helpers.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
tuple[DataFrame, Series]
|
|
is_boolean_series
Check whether series contains only boolean-like values (including NaN).
Handles the case where older pandas converts bool + NaN to float.
Factor analysis module for evaluating factor stock selection ability and contribution.
Key Functions:
| Function | Description | Purpose |
|---|---|---|
calc_factor_return() |
Calculate factor return | Evaluate overall factor performance |
calc_ic() |
Calculate factor IC (Information Coefficient) | Evaluate factor correlation with future returns |
calc_shapley_values() |
Calculate Shapley values | Evaluate marginal contribution of each factor in multi-factor models |
calc_centrality() |
Calculate factor concentration | Evaluate factor distribution across stocks |
calc_factor_return()
Calculate factor return to evaluate stock selection ability.
Usage Examples:
from finlab import data
from finlab.tools import factor_analysis as fa
# Build factor: small cap
marketcap = data.get('etl:market_value')
factor = marketcap.rank(pct=True, axis=1) < 0.3
# Calculate future 1-month return (label)
close = data.get('price:收盤價')
future_return = close.shift(-20) / close - 1
# Calculate factor return
factor_return = fa.calc_factor_return(factor, label=future_return)
# View results
print(f"Average factor return: {factor_return.mean():.2%}")
print(f"Factor return std dev: {factor_return.std():.2%}")
print(f"Factor Sharpe ratio: {factor_return.mean() / factor_return.std():.2f}")
Interpretation: - Positive: Factor has stock selection ability (selected stocks outperform) - Negative: Factor works in reverse (should inverse the signal) - Near 0: Factor has no stock selection ability
Best Practices
- Use long backtest periods to verify factor stability
- Compare factor performance across different market environments (bull/bear)
- Combine with IC analysis to confirm factor effectiveness
calc_ic()
Calculate factor IC (Information Coefficient), measuring the correlation between factor and future returns.
Usage Examples:
from finlab import data
from finlab.tools import factor_analysis as fa
# Build factor: revenue growth rate
revenue = data.get('monthly_revenue:當月營收')
factor = revenue.average(3) / revenue.average(12) # 3-month avg vs 12-month avg
# Calculate future return
close = data.get('price:收盤價')
future_return = close.shift(-20) / close - 1
# Calculate IC
ic = fa.calc_ic(factor, label=future_return)
# View results
print(f"Average IC: {ic.mean():.4f}")
print(f"IC std dev: {ic.std():.4f}")
print(f"IC_IR (IC Sharpe ratio): {ic.mean() / ic.std():.2f}")
# Visualize IC time series
ic.plot(title='Revenue Growth Factor IC')
IC Evaluation Standards:
| IC Value | Rating | Description |
|---|---|---|
| > 0.05 | Excellent | Factor has strong stock selection ability |
| 0.03 ~ 0.05 | Good | Factor has stock selection ability |
| 0.01 ~ 0.03 | Average | Factor has weak stock selection ability |
| < 0.01 | Ineffective | Factor has no stock selection ability |
IC Analysis Notes
- IC > 0: Factor positively correlates with future returns (go long on high-scoring stocks)
- IC < 0: Factor negatively correlates with future returns (go long on low-scoring stocks)
- High IC variance: Factor is unstable, higher risk
- IC_IR > 0.5: Factor performs well on a risk-adjusted basis
calc_shapley_values()
Calculate Shapley values to evaluate the marginal contribution of each factor in a multi-factor model.
Usage Examples:
from finlab import data
from finlab.tools import factor_analysis as fa
# Build multiple factors
marketcap = data.get('etl:market_value')
revenue = data.get('monthly_revenue:當月營收')
close = data.get('price:收盤價')
factor1 = marketcap.rank(pct=True, axis=1) < 0.3 # Small cap
factor2 = (revenue.average(3) / revenue.average(12)).rank(pct=True, axis=1) > 0.7 # Revenue growth
factor3 = (close / close.shift(20)).rank(pct=True, axis=1) > 0.7 # Momentum
# Calculate future return
future_return = close.shift(-20) / close - 1
# Calculate Shapley values
factors = [factor1, factor2, factor3]
shapley = fa.calc_shapley_values(factors, label=future_return)
# View results
print("Shapley values for each factor:")
for i, value in enumerate(shapley):
print(f"Factor {i+1}: {value:.4f}")
Interpretation: - High Shapley value: Factor contributes significantly, high importance - Low Shapley value: Factor contributes little, consider removing - Negative Shapley value: Factor is harmful, drags down overall performance
Application Recommendations
- Remove factors with negative or near-zero Shapley values
- Retain factors with high Shapley values
- Re-evaluate strategy performance after removing factors
calc_centrality()
Calculate factor concentration, evaluating factor distribution across stocks.
Usage Examples:
from finlab import data
from finlab.tools import factor_analysis as fa
# Build factor
marketcap = data.get('etl:market_value')
factor = marketcap.rank(pct=True, axis=1) < 0.3
# Calculate concentration
centrality = fa.calc_centrality(factor)
print(f"Factor concentration: {centrality}")
Interpretation: - High concentration: Factor is concentrated in few stocks, insufficient diversification - Low concentration: Factor is distributed across many stocks, good diversification
Complete Example: Multi-Factor Strategy Analysis
from finlab import data
from finlab.tools import factor_analysis as fa
from finlab.backtest import sim
# Step 1: Build factors
marketcap = data.get('etl:market_value')
revenue = data.get('monthly_revenue:當月營收')
close = data.get('price:收盤價')
factor1 = marketcap.rank(pct=True, axis=1) < 0.3 # Small cap
factor2 = (revenue.average(3) / revenue.average(12)).rank(pct=True, axis=1) > 0.7 # Revenue growth
factor3 = (close / close.shift(20)).rank(pct=True, axis=1) > 0.7 # Momentum
# Step 2: Calculate labels (future returns)
future_return = close.shift(-20) / close - 1
# Step 3: Analyze each factor
print("=== Factor Return Analysis ===")
for i, factor in enumerate([factor1, factor2, factor3], 1):
fr = fa.calc_factor_return(factor, label=future_return)
ic = fa.calc_ic(factor.rank(pct=True, axis=1), label=future_return)
print(f"\nFactor {i}:")
print(f" Average return: {fr.mean():.2%}")
print(f" Average IC: {ic.mean():.4f}")
# Step 4: Calculate Shapley values (contribution)
print("\n=== Shapley Values Analysis ===")
shapley = fa.calc_shapley_values([factor1, factor2, factor3], label=future_return)
for i, value in enumerate(shapley, 1):
print(f"Factor {i} contribution: {value:.4f}")
# Step 5: Backtest strategy
position = factor1 & factor2 & factor3
report = sim(position, resample='M')
report.display()
print("\n=== Strategy Performance ===")
metrics = report.get_metrics()
print(f"Annual return: {metrics['annual_return']:.2%}")
print(f"Sharpe ratio: {metrics['daily_sharpe']:.2f}")
print(f"Max drawdown: {metrics['max_drawdown']:.2%}")
FAQ
Q: How do I determine if a factor is effective?
Use three metrics for comprehensive evaluation:
# 1. Factor return (should be significantly > 0)
factor_return = fa.calc_factor_return(factor, label=future_return)
print(f"Average factor return: {factor_return.mean():.2%}")
# 2. IC (should be > 0.02)
ic = fa.calc_ic(factor.rank(pct=True, axis=1), label=future_return)
print(f"Average IC: {ic.mean():.4f}")
# 3. IC_IR (should be > 0.5)
ic_ir = ic.mean() / ic.std()
print(f"IC_IR: {ic_ir:.2f}")
# Conclusion: Factor is effective only when all three metrics pass
Q: How do I apply Shapley values to strategy optimization?
# Calculate Shapley values for all factors
factors = [factor1, factor2, factor3, factor4, factor5]
shapley = fa.calc_shapley_values(factors, label=future_return)
# Remove factors with low or negative contribution
threshold = 0.001
valid_factors = [f for f, s in zip(factors, shapley) if s > threshold]
print(f"Original factor count: {len(factors)}")
print(f"Optimized factor count: {len(valid_factors)}")
# Use optimized factor combination
optimized_position = valid_factors[0]
for factor in valid_factors[1:]:
optimized_position = optimized_position & factor
Q: What is the difference between IC and factor return?
- IC (Information Coefficient):
- Measures the correlation between factor values and future returns
- Range [-1, 1], closer to 1 is better
-
Suitable for evaluating continuous factors (e.g., market cap, ROE)
-
Factor Return:
- Measures the average return of stocks selected by the factor
- Suitable for evaluating binary factors (True/False)
- Directly reflects stock selection effectiveness
# Use IC for continuous factors
continuous_factor = data.get('price_earning_ratio:本益比')
ic = fa.calc_ic(continuous_factor, label=future_return)
# Use factor return for binary factors
binary_factor = continuous_factor < continuous_factor.quantile(0.3, axis=1)
factor_return = fa.calc_factor_return(binary_factor, label=future_return)
Resources
- Factor Analysis Tutorial - End-to-end examples
- Complete Strategy Development Workflow - Includes factor analysis steps
- Machine Learning Strategies - Use ML to automatically discover factors