finlab.tools

Utility module providing event study and factor analysis functionality.

Use Cases

Analyze the impact of specific events on stock prices (event study)
Evaluate factor stock selection ability (factor analysis)
Optimize factor combinations in strategies
Understand the sources of strategy performance

Quick Examples

Event Study

from finlab.tools import event_study

# Analyze stock price performance after revenue announcements
# (Requires actual event data)

Factor Analysis

from finlab import data
from finlab.tools import factor_analysis as fa

# Prepare factors and labels
marketcap = data.get('etl:market_value')
revenue = data.get('monthly_revenue:當月營收')

# Build factors
cond1 = marketcap.rank(pct=True, axis=1) < 0.3  # Small cap
cond2 = (revenue.average(3) / revenue.average(12)).rank(pct=True, axis=1) > 0.7  # Revenue growth

# Calculate factor return
factor_return = fa.calc_factor_return(cond1, label=future_return)
print(factor_return)

Detailed Guide

See Factor Analysis Tutorial for: - Complete factor analysis workflow - Factor return calculation - Factor IC analysis - Shapley values contribution analysis

API Reference

event_study()

finlab.tools.event_study

create_factor_data

create_factor_data(
    factor, adj_close, days=None, event=None
)

create factor data, which contains future return

PARAMETER	DESCRIPTION
`factor`	factor data where index is datetime and columns is asset id TYPE: `DataFrame`
`adj_close`	adj close where index is datetime and columns is asset id TYPE: `DataFrame`
`days`	future return considered TYPE: `list[int]` DEFAULT: `None`

Return

Analytic plots and tables

Warning

This function is not identical to finlab.ml.alphalens.create_factor_data

Examples:

現金增減資分析

from finlab.tools.event_study import create_factor_data
from finlab.tools.event_study import event_study

factor = data.get('price_earning_ratio:股價淨值比')
adj_close = data.get('etl:adj_close')
benchmark = data.get('benchmark_return:發行量加權股價報酬指數')

# create event dataframe
dividend_info = data.get('dividend_announcement')
v = dividend_info[['stock_id', '除權交易日']].set_index(['stock_id', '除權交易日'])
v['value'] = 1
event = v[~v.index.duplicated()].reset_index().drop_duplicates(
    subset=['stock_id', '除權交易日']
).pivot(index='除權交易日', columns='stock_id', values='value').notna()

# calculate factor_data
factor_data = create_factor_data({'pb':factor}, adj_close, event=event)

r = event_study(factor_data, benchmark, adj_close)

plt.bar(r.columns, r.mean().values)
plt.plot(r.columns, r.mean().cumsum().values)

event_study

event_study(
    factor_data,
    benchmark_adj_close,
    stock_adj_close,
    sample_period=(-45, -20),
    estimation_period=(-5, 20),
    plot=True,
)

Run event study and returns the abnormal returns of each stock on each day.

PARAMETER	DESCRIPTION
`factor_data`	factor data where index is datetime and columns is asset id TYPE: `DataFrame`
`benchmark_adj_close`	benchmark for CAPM TYPE: `DataFrame`
`stock_adj_close`	stock price for CAPM TYPE: `DataFrame`
`sample_period`	period for fitting CAPM TYPE: `(int, int)` DEFAULT: `(-45, -20)`
`estimation_period`	period for calculating alpha (abnormal return) TYPE: `(int, int)` DEFAULT: `(-5, 20)`
`plot`	plot the result TYPE: `bool` DEFAULT: `True`

Return

Abnormal returns of each stock on each day.

Examples:

現金增減資分析

from finlab.tools.event_study import create_factor_data
from finlab.tools.event_study import event_study

factor = data.get('price_earning_ratio:股價淨值比')
adj_close = data.get('etl:adj_close')
benchmark = data.get('benchmark_return:發行量加權股價報酬指數')

# create event dataframe
dividend_info = data.get('dividend_announcement')
v = dividend_info[['stock_id', '除權交易日']].set_index(['stock_id', '除權交易日'])
v['value'] = 1
event = v[~v.index.duplicated()].reset_index().drop_duplicates(
    subset=['stock_id', '除權交易日']
).pivot(index='除權交易日', columns='stock_id', values='value').notna()

# calculate factor_data
factor_data = create_factor_data({'pb':factor}, adj_close, event=event)

r = event_study(factor_data, benchmark, adj_close)

plt.bar(r.columns, r.mean().values)
plt.plot(r.columns, r.mean().cumsum().values)

plot_event_study

plot_event_study(returns)

Plot the event study for the given returns.

PARAMETER	DESCRIPTION
`returns`	A DataFrame containing the returns data. TYPE: `DataFrame`

Return

ax (matplotlib.axes.Axes): The axes object containing the plot.

Event Study: Analyze the impact of specific events on stock prices.

Usage Notes

Event study is used to analyze the short-term and long-term impact of specific events (such as earnings announcements, investor conferences, dividend distributions) on stock prices.

factor_analysis

finlab.tools.factor_analysis

Factor analysis toolkit -- public facade.

All public symbols are re-exported from focused submodules so that existing from finlab.tools.factor_analysis import ... imports keep working unchanged.

Submodules

:mod:finlab.tools.factor_metrics -- IC, correlation, NDCG scoring
:mod:finlab.tools.factor_returns -- boolean factor returns & Shapley values
:mod:finlab.tools.factor_centrality -- PCA-based rolling centrality
:mod:finlab.tools.factor_regression -- OLS trend analysis

calc_centrality

calc_centrality(return_df, window_periods, n_components=1)

Compute rolling PCA centrality over return_df.

PARAMETER	DESCRIPTION
`return_df`	Time-series DataFrame (dates x assets/factors). TYPE: `DataFrame`
`window_periods`	Rolling window length in rows. TYPE: `int`
`n_components`	Number of PCA components. TYPE: `int` DEFAULT: `1`

RETURNS	DESCRIPTION
`DataFrame`	DataFrame of rolling centrality scores.

calc_factor_return

calc_factor_return(features, labels)

Compute equal-weight portfolio returns per boolean factor.

Each column in features must be boolean. For every date the mean label value across selected (True) stocks is returned.

PARAMETER	DESCRIPTION
`features`	Boolean feature DataFrame (MultiIndex: date x stock). TYPE: `DataFrame`
`labels`	Excess-return labels with the same MultiIndex. TYPE: `Series`

RETURNS	DESCRIPTION
`DataFrame`	DataFrame of per-period returns, one column per factor,
`DataFrame`	starting from the first fully-populated date.

RAISES	DESCRIPTION
`ValueError`	If any feature column is not boolean.

calc_ic

calc_ic(features, labels, rank=False)

Compute per-date IC between features and labels.

PARAMETER	DESCRIPTION
`features`	MultiIndex DataFrame (date, stock) with factor columns. TYPE: `DataFrame`
`labels`	MultiIndex Series with the same index. TYPE: `Series`
`rank`	If `True`, rank features before computing correlation. TYPE: `bool` DEFAULT: `False`

RETURNS	DESCRIPTION
`DataFrame`	DataFrame of IC values per date and factor.

calc_metric

calc_metric(factor, adj_close, days=None, func=corr)

Compute a cross-sectional metric between factor and forward returns.

PARAMETER	DESCRIPTION
`factor`	Single factor DataFrame or dict of named factor DataFrames. TYPE: `DataFrame \| dict[str, DataFrame]`
`adj_close`	Adjusted close prices. TYPE: `DataFrame`
`days`	Forward-return horizons (default `[10, 20, 60, 120]`). TYPE: `list[int] \| None` DEFAULT: `None`
`func`	Scoring callable applied per date group (default :func:`corr`). TYPE: `Callable[[DataFrame], float]` DEFAULT: `corr`

RETURNS	DESCRIPTION
`DataFrame`	DataFrame with one column per `(factor_name, horizon)` pair.

calc_regression_stats

calc_regression_stats(
    df, p_value_threshold=0.05, r_squared_threshold=0.1
)

Run per-column OLS regressions and classify each trend.

PARAMETER	DESCRIPTION
`df`	Time-series DataFrame with a `DatetimeIndex`. TYPE: `DataFrame`
`p_value_threshold`	Significance level for trend classification. TYPE: `float` DEFAULT: `0.05`
`r_squared_threshold`	Minimum R-squared for a non-flat trend. TYPE: `float` DEFAULT: `0.1`

RETURNS	DESCRIPTION
`DataFrame`	DataFrame with columns `slope`, `intercept`, `r_squared`,
`DataFrame`	`p_value`, `tail_estimate`, and `trend` (`"up"`/`"down"`/`"flat"`).

RAISES	DESCRIPTION
`ValueError`	If the DataFrame index is not a `DatetimeIndex`.

calc_shapley_values

calc_shapley_values(features, labels)

Compute Shapley values measuring each factor's marginal contribution.

Enumerates all 2^n subsets of factors, computes the equal-weight portfolio return of each subset, and distributes the return evenly among the factors in that subset.

PARAMETER	DESCRIPTION
`features`	Boolean feature DataFrame (MultiIndex: date x stock). TYPE: `DataFrame`
`labels`	Excess-return labels with the same MultiIndex. TYPE: `Series`

RETURNS	DESCRIPTION
`DataFrame`	DataFrame of Shapley values, one column per factor.

RAISES	DESCRIPTION
`ValueError`	If features is empty, labels lacks a MultiIndex, or any feature column is not boolean.

corr

corr(df)

Pearson correlation between the first two columns of df.

generate_features_and_labels

generate_features_and_labels(dfs, resample)

Build a feature matrix and an excess-return label vector.

Wraps :func:finlab.ml.feature.combine and :func:finlab.ml.label.excess_over_mean.

PARAMETER	DESCRIPTION
`dfs`	Dict mapping factor names to DataFrames (or callables). TYPE: `dict[str, DataFrame \| Callable[[], DataFrame]]`
`resample`	Resampling frequency / index passed to the ML helpers. TYPE: `str`

RETURNS	DESCRIPTION
`tuple[DataFrame, Series]`	`(features, labels)` tuple.

ic

ic(factor, adj_close, days=None)

Shorthand for calc_metric(factor, adj_close, days, func=corr).

is_boolean_series

is_boolean_series(series)

Check whether series contains only boolean-like values (including NaN).

Handles the case where older pandas converts bool + NaN to float.

ndcg_k

ndcg_k(k)

Return an NDCG scorer truncated at rank k.

precision_at_rank

precision_at_rank(k)

Return a precision scorer that evaluates the top 1 - k quantile.

Factor analysis module for evaluating factor stock selection ability and contribution.

Key Functions:

Function	Description	Purpose
`calc_factor_return()`	Calculate factor return	Evaluate overall factor performance
`calc_ic()`	Calculate factor IC (Information Coefficient)	Evaluate factor correlation with future returns
`calc_shapley_values()`	Calculate Shapley values	Evaluate marginal contribution of each factor in multi-factor models
`calc_centrality()`	Calculate factor concentration	Evaluate factor distribution across stocks

calc_factor_return()

Calculate factor return to evaluate stock selection ability.

Usage Examples:

from finlab import data
from finlab.tools import factor_analysis as fa

# Build factor: small cap
marketcap = data.get('etl:market_value')
factor = marketcap.rank(pct=True, axis=1) < 0.3

# Calculate future 1-month return (label)
close = data.get('price:收盤價')
future_return = close.shift(-20) / close - 1

# Calculate factor return
factor_return = fa.calc_factor_return(factor, label=future_return)

# View results
print(f"Average factor return: {factor_return.mean():.2%}")
print(f"Factor return std dev: {factor_return.std():.2%}")
print(f"Factor Sharpe ratio: {factor_return.mean() / factor_return.std():.2f}")

Interpretation: - Positive: Factor has stock selection ability (selected stocks outperform) - Negative: Factor works in reverse (should inverse the signal) - Near 0: Factor has no stock selection ability

Best Practices

Use long backtest periods to verify factor stability
Compare factor performance across different market environments (bull/bear)
Combine with IC analysis to confirm factor effectiveness

calc_ic()

Calculate factor IC (Information Coefficient), measuring the correlation between factor and future returns.

Usage Examples:

from finlab import data
from finlab.tools import factor_analysis as fa

# Build factor: revenue growth rate
revenue = data.get('monthly_revenue:當月營收')
factor = revenue.average(3) / revenue.average(12)  # 3-month avg vs 12-month avg

# Calculate future return
close = data.get('price:收盤價')
future_return = close.shift(-20) / close - 1

# Calculate IC
ic = fa.calc_ic(factor, label=future_return)

# View results
print(f"Average IC: {ic.mean():.4f}")
print(f"IC std dev: {ic.std():.4f}")
print(f"IC_IR (IC Sharpe ratio): {ic.mean() / ic.std():.2f}")

# Visualize IC time series
ic.plot(title='Revenue Growth Factor IC')

IC Evaluation Standards:

IC Value	Rating	Description
> 0.05	Excellent	Factor has strong stock selection ability
0.03 ~ 0.05	Good	Factor has stock selection ability
0.01 ~ 0.03	Average	Factor has weak stock selection ability
< 0.01	Ineffective	Factor has no stock selection ability

IC Analysis Notes

IC > 0: Factor positively correlates with future returns (go long on high-scoring stocks)
IC < 0: Factor negatively correlates with future returns (go long on low-scoring stocks)
High IC variance: Factor is unstable, higher risk
IC_IR > 0.5: Factor performs well on a risk-adjusted basis

calc_shapley_values()

Calculate Shapley values to evaluate the marginal contribution of each factor in a multi-factor model.

Usage Examples:

from finlab import data
from finlab.tools import factor_analysis as fa

# Build multiple factors
marketcap = data.get('etl:market_value')
revenue = data.get('monthly_revenue:當月營收')
close = data.get('price:收盤價')

factor1 = marketcap.rank(pct=True, axis=1) < 0.3  # Small cap
factor2 = (revenue.average(3) / revenue.average(12)).rank(pct=True, axis=1) > 0.7  # Revenue growth
factor3 = (close / close.shift(20)).rank(pct=True, axis=1) > 0.7  # Momentum

# Calculate future return
future_return = close.shift(-20) / close - 1

# Calculate Shapley values
factors = [factor1, factor2, factor3]
shapley = fa.calc_shapley_values(factors, label=future_return)

# View results
print("Shapley values for each factor:")
for i, value in enumerate(shapley):
    print(f"Factor {i+1}: {value:.4f}")

Interpretation: - High Shapley value: Factor contributes significantly, high importance - Low Shapley value: Factor contributes little, consider removing - Negative Shapley value: Factor is harmful, drags down overall performance

Application Recommendations

Remove factors with negative or near-zero Shapley values
Retain factors with high Shapley values
Re-evaluate strategy performance after removing factors

calc_centrality()

Calculate factor concentration, evaluating factor distribution across stocks.

Usage Examples:

from finlab import data
from finlab.tools import factor_analysis as fa

# Build factor
marketcap = data.get('etl:market_value')
factor = marketcap.rank(pct=True, axis=1) < 0.3

# Calculate concentration
centrality = fa.calc_centrality(factor)

print(f"Factor concentration: {centrality}")

Interpretation: - High concentration: Factor is concentrated in few stocks, insufficient diversification - Low concentration: Factor is distributed across many stocks, good diversification

Complete Example: Multi-Factor Strategy Analysis

from finlab import data
from finlab.tools import factor_analysis as fa
from finlab.backtest import sim

# Step 1: Build factors
marketcap = data.get('etl:market_value')
revenue = data.get('monthly_revenue:當月營收')
close = data.get('price:收盤價')

factor1 = marketcap.rank(pct=True, axis=1) < 0.3  # Small cap
factor2 = (revenue.average(3) / revenue.average(12)).rank(pct=True, axis=1) > 0.7  # Revenue growth
factor3 = (close / close.shift(20)).rank(pct=True, axis=1) > 0.7  # Momentum

# Step 2: Calculate labels (future returns)
future_return = close.shift(-20) / close - 1

# Step 3: Analyze each factor
print("=== Factor Return Analysis ===")
for i, factor in enumerate([factor1, factor2, factor3], 1):
    fr = fa.calc_factor_return(factor, label=future_return)
    ic = fa.calc_ic(factor.rank(pct=True, axis=1), label=future_return)
    print(f"\nFactor {i}:")
    print(f"  Average return: {fr.mean():.2%}")
    print(f"  Average IC: {ic.mean():.4f}")

# Step 4: Calculate Shapley values (contribution)
print("\n=== Shapley Values Analysis ===")
shapley = fa.calc_shapley_values([factor1, factor2, factor3], label=future_return)
for i, value in enumerate(shapley, 1):
    print(f"Factor {i} contribution: {value:.4f}")

# Step 5: Backtest strategy
position = factor1 & factor2 & factor3
report = sim(position, resample='M')
report.display()

print("\n=== Strategy Performance ===")
metrics = report.get_metrics()
print(f"Annual return: {metrics['annual_return']:.2%}")
print(f"Sharpe ratio: {metrics['daily_sharpe']:.2f}")
print(f"Max drawdown: {metrics['max_drawdown']:.2%}")

FAQ

Q: How do I determine if a factor is effective?

Use three metrics for comprehensive evaluation:

# 1. Factor return (should be significantly > 0)
factor_return = fa.calc_factor_return(factor, label=future_return)
print(f"Average factor return: {factor_return.mean():.2%}")

# 2. IC (should be > 0.02)
ic = fa.calc_ic(factor.rank(pct=True, axis=1), label=future_return)
print(f"Average IC: {ic.mean():.4f}")

# 3. IC_IR (should be > 0.5)
ic_ir = ic.mean() / ic.std()
print(f"IC_IR: {ic_ir:.2f}")

# Conclusion: Factor is effective only when all three metrics pass

Q: How do I apply Shapley values to strategy optimization?

# Calculate Shapley values for all factors
factors = [factor1, factor2, factor3, factor4, factor5]
shapley = fa.calc_shapley_values(factors, label=future_return)

# Remove factors with low or negative contribution
threshold = 0.001
valid_factors = [f for f, s in zip(factors, shapley) if s > threshold]

print(f"Original factor count: {len(factors)}")
print(f"Optimized factor count: {len(valid_factors)}")

# Use optimized factor combination
optimized_position = valid_factors[0]
for factor in valid_factors[1:]:
    optimized_position = optimized_position & factor

Q: What is the difference between IC and factor return?

IC (Information Coefficient):
Measures the correlation between factor values and future returns
Range [-1, 1], closer to 1 is better
Suitable for evaluating continuous factors (e.g., market cap, ROE)
Factor Return:
Measures the average return of stocks selected by the factor
Suitable for evaluating binary factors (True/False)
Directly reflects stock selection effectiveness

# Use IC for continuous factors
continuous_factor = data.get('price_earning_ratio:本益比')
ic = fa.calc_ic(continuous_factor, label=future_return)

# Use factor return for binary factors
binary_factor = continuous_factor < continuous_factor.quantile(0.3, axis=1)
factor_return = fa.calc_factor_return(binary_factor, label=future_return)

Resources

Factor Analysis Tutorial - End-to-end examples
Complete Strategy Development Workflow - Includes factor analysis steps
Machine Learning Strategies - Use ML to automatically discover factors