Analyzing Strategy Factors

Research factor analysis to add, remove, or modify factors based on data rather than intuition. Understand the relationship between factors and strategy returns.

Factor Analysis Workflow in FinLab Package:

Example Strategy
    |
    +--- Feature Engineering ----> Factor Features (Features)
    |
    +--- Label Engineering  ----> Excess Returns (Labels)
                |
                +--- calc_factor_return() ----> Factor Return
                |      |
                |      +-- calc_centrality() --> Factor Centrality
                |
                +--- calc_shapley_values() ---> Factor Contribution
                |
                +--- calc_ic() ----------------> Factor Correlation (IC)

We will demonstrate how to:

Convert an "Example Strategy" into "Features" and "Labels"
Use the above data to analyze factor performance

Example Strategy

This strategy is composed of three basic factors: 1. Market capitalization 2. Revenue 3. Momentum

from finlab import data
from finlab.backtest import sim

marketcap = data.get('etl:market_value')
revenue = data.get('monthly_revenue:當月營收')
price = data.get('etl:adj_close')

cond1 = marketcap.rank(pct=True, axis=1) < 0.3
cond2 = (revenue.average(3) / revenue.average(12)).rank(pct=True, axis=1) > 0.7
cond3 = (price / price.shift(20)).rank(pct=True, axis=1) > 0.7

pos = cond1 & cond2 & cond3

report = sim(pos, resample='ME', upload=False)
report.creturn.plot()

Creating Features and Labels

from finlab import data
import finlab.ml.feature as feature
import finlab.ml.label as label

features =  feature.combine({
    'marketcap': cond1,
    'revenue': cond2,
    'momentum': cond3
}, resample='ME')

# Compute labels for the next 1 month
labels = label.excess_over_mean(index=features.index, resample='ME')

features.dropna().head()

datetime	instrument	marketcap	revenue	momentum
2013-04-30	1101	False	True	True
2013-04-30	1102	False	True	True
2013-04-30	1103	False	True	False
2013-04-30	1104	False	True	False
2013-04-30	1108	False	True	False

labels.dropna().head()

datetime    instrument
2007-04-30  1101          0.042548
            1102          0.081764
            1103         -0.016863
            1104         -0.047492
            1108         -0.002247
dtype: float64

Factor Return

Factor Return measures the excess return brought by a factor over a period of time, commonly used to evaluate a factor's effectiveness and investment value.

In this analysis workflow, features are built based on different factors (such as market cap, revenue growth, momentum), and the factor return for each is calculated. By observing the cumulative factor return trend, you can assess the factor's long-term performance and stability, serving as an important basis for factor selection and portfolio construction.

The chart below shows the cumulative return for each factor.

from finlab.tools.factor_analysis import calc_factor_return
from finlab.plot import plot_line

plot_line(calc_factor_return(features, labels).cumsum(), unit='.0%', title='Factor Cumulative Return')

From the chart, we can observe that the monthly revenue factor performs significantly better, while the market cap factor performs worse.

Factor Centrality

Factor Centrality is a metric used to quantify "commonality" among factors.

Factor Centrality is defined as:

\[ \text{Centrality}_i = \frac{\lambda_i}{ \sum_{j=1}^{k}\lambda_j} \]

where \(\lambda_j\) is the factor's contribution to the first principal component, and \(k\) is the total number of factors.

This metric reflects the commonality of factor returns:

Higher values indicate:
- Using this factor for stock selection has performed better recently.
- Higher centrality increases the risk of future mean reversion; close monitoring is needed.
Lower values indicate:
- Using this factor for stock selection has performed worse recently.
- Lower centrality means lower risk of future mean reversion.
- Consider monitoring the factor when crowding is low, and use it when crowding starts to rise (factor reversion effect).

from finlab.tools.factor_analysis import calc_centrality


plot_line(calc_centrality(calc_factor_return(features, labels), 12), title='Centrality')

In the chart above, the market cap factor currently has low crowding. You might consider switching to a different factor to make the strategy more stable.

Alternatively, you can closely monitor the market cap factor and use it when crowding starts to rise (factor reversion effect).

Factor Contribution (Shapley Values)

Shapley Values is a mathematical method for fairly quantifying each factor's contribution to portfolio returns.

We enumerate all possible factor combinations, calculate the return for each combination, and determine each factor's contribution to the returns.

We can even compute each factor's contribution at each point in time.

It is worth noting that the time complexity of Shapley Values computation is \(O(2^n)\), where \(n\) is the number of factors. Computation time can be long, so for a large number of factors, consider using alternative methods.

from finlab.tools.factor_analysis import calc_shapley_values


plot_line(calc_shapley_values(features, labels))

Factor Correlation (Information Coefficient)

Factor IC (Information Coefficient) is an important metric for measuring a factor's predictive ability, commonly used to evaluate the correlation between a factor and future returns.

The IC formula is:

\[ IC = \text{corr}(\text{Factor Score}, \text{Future Return}) \]

where \(\text{corr}\) is the correlation coefficient (usually Pearson), \(\text{Factor Score}\) is the factor score, and \(\text{Future Return}\) is the asset return over a future period.

IC values typically range between -1 and 1:

IC close to 1 indicates strong positive predictive ability for future returns.
IC close to -1 indicates strong negative (contrarian) predictive ability for future returns.
IC close to 0 indicates almost no predictive ability for future returns.

In quantitative stock selection, IC is a key basis for judging factor effectiveness. Factors with higher IC are generally more worth including in the portfolio.

from finlab.tools.factor_analysis import calc_ic

features_raw =  feature.combine({
    'marketcap': -marketcap, # Small cap (negated)
    'revenue': revenue.average(3) / revenue.average(12),
    'momentum': price / price.shift(20)
}, resample='ME')

plot_line(calc_ic(features_raw, labels, rank=True))

Factor Trend Analysis

Overview

The "centrality", "contribution", and "correlation" values changing over time can be analyzed for trends using calc_regression_stats:

from finlab.tools.factor_analysis import calc_regression_stats

centrality_df = calc_centrality(features, labels)
centrality_trend = calc_regression_stats(centrality_df)

	slope	p_value	r_squared	tail_estimate	trend
marketcap	-0.000111	3.102740e-17	0.404468	0.0123	down
revenue	0.000018	4.861743e-03	0.056041	0.0087	flat
momentum	0.000093	1.146515e-17	0.412914	0.0215	up

Core Fields

Field	Description	Value Range	Meaning
slope	Linear regression slope	(-inf, +inf)	Trend direction and strength
p_value	Statistical significance	[0, 1]	Trend credibility
r_squared	Coefficient of determination	[0, 1]	Explanatory power of the linear model
tail_estimate	Tail estimate value	(-inf, +inf)	Predicted value at the end of the time series
trend	Trend classification	"up"/"down"/"flat"	Simplified trend judgment

Value Ranges and Interpretations

slope

Positive: Upward trend, increasing crowding
Negative: Downward trend, decreasing crowding
Absolute value: Trend strength (larger = stronger)

p_value (Statistical Significance)

[0, 0.01]: Highly significant (***)
[0.01, 0.05]: Significant (**)
[0.05, 0.1]: Marginally significant (*)
[0.1, 1]: Not significant

r_squared (Coefficient of Determination)

[0.8, 1.0]: Very strong explanatory power
[0.6, 0.8): Strong explanatory power
[0.4, 0.6): Moderate explanatory power
[0.2, 0.4): Weak explanatory power
[0, 0.2): Very weak explanatory power

Trend Combination Analysis

p_value	r_squared	slope	trend	Meaning
small	high	positive	up	Strong and stable upward trend
small	high	negative	down	Strong and stable downward trend
small	low	any	flat	Trend exists but effect is small / high noise
large	high	any	flat	Small sample, high noise, cannot determine
large	low	any	flat	Essentially no trend and model has no explanatory power

Case Studies

Marketcap (Market Cap Factor)

slope: -0.000111 (negative)
p_value: 3.10e-17 (highly significant)
r_squared: 0.40 (moderate to strong explanatory power)
trend: down

Interpretation: The market cap factor's centrality shows a very strong and statistically highly significant downward trend.

Revenue (Revenue Factor)

slope: 0.000018 (positive)
p_value: 0.0048 (significant)
r_squared: 0.056 (weak explanatory power)
trend: flat (because r_squared < 0.1)

Interpretation: The revenue factor has a statistically significant upward trend, but due to insufficient explanatory power (r_squared < 0.1), it is classified as flat.

Momentum (Momentum Factor)

slope: 0.000093 (significantly positive)
p_value: 1.14e-17 (highly significant)
r_squared: 0.41 (moderate to strong explanatory power)
trend: up

Interpretation: The momentum factor's centrality shows a very strong and statistically highly significant upward trend. This is a clear signal that the momentum factor is very "popular" and a large amount of capital is chasing this strategy.