Analyzing Strategy Factors
Research factor analysis to add, remove, or modify factors based on data rather than intuition. Understand the relationship between factors and strategy returns.
Factor Analysis Workflow in FinLab Package:
Example Strategy
|
+--- Feature Engineering ----> Factor Features (Features)
|
+--- Label Engineering ----> Excess Returns (Labels)
|
+--- calc_factor_return() ----> Factor Return
| |
| +-- calc_centrality() --> Factor Centrality
|
+--- calc_shapley_values() ---> Factor Contribution
|
+--- calc_ic() ----------------> Factor Correlation (IC)
We will demonstrate how to:
- Convert an "Example Strategy" into "Features" and "Labels"
- Use the above data to analyze factor performance
Example Strategy
This strategy is composed of three basic factors: 1. Market capitalization 2. Revenue 3. Momentum
from finlab import data
from finlab.backtest import sim
marketcap = data.get('etl:market_value')
revenue = data.get('monthly_revenue:當月營收')
price = data.get('etl:adj_close')
cond1 = marketcap.rank(pct=True, axis=1) < 0.3
cond2 = (revenue.average(3) / revenue.average(12)).rank(pct=True, axis=1) > 0.7
cond3 = (price / price.shift(20)).rank(pct=True, axis=1) > 0.7
pos = cond1 & cond2 & cond3
report = sim(pos, resample='ME', upload=False)
report.creturn.plot()
Creating Features and Labels
from finlab import data
import finlab.ml.feature as feature
import finlab.ml.label as label
features = feature.combine({
'marketcap': cond1,
'revenue': cond2,
'momentum': cond3
}, resample='ME')
# Compute labels for the next 1 month
labels = label.excess_over_mean(index=features.index, resample='ME')
features.dropna().head()
| datetime | instrument | marketcap | revenue | momentum |
|---|---|---|---|---|
| 2013-04-30 | 1101 | False | True | True |
| 2013-04-30 | 1102 | False | True | True |
| 2013-04-30 | 1103 | False | True | False |
| 2013-04-30 | 1104 | False | True | False |
| 2013-04-30 | 1108 | False | True | False |
datetime instrument
2007-04-30 1101 0.042548
1102 0.081764
1103 -0.016863
1104 -0.047492
1108 -0.002247
dtype: float64
Factor Return
Factor Return measures the excess return brought by a factor over a period of time, commonly used to evaluate a factor's effectiveness and investment value.
In this analysis workflow, features are built based on different factors (such as market cap, revenue growth, momentum), and the factor return for each is calculated. By observing the cumulative factor return trend, you can assess the factor's long-term performance and stability, serving as an important basis for factor selection and portfolio construction.
The chart below shows the cumulative return for each factor.
from finlab.tools.factor_analysis import calc_factor_return
from finlab.plot import plot_line
plot_line(calc_factor_return(features, labels).cumsum(), unit='.0%', title='Factor Cumulative Return')
From the chart, we can observe that the monthly revenue factor performs significantly better, while the market cap factor performs worse.
Factor Centrality
Factor Centrality is a metric used to quantify "commonality" among factors.
Factor Centrality is defined as:
where \(\lambda_j\) is the factor's contribution to the first principal component, and \(k\) is the total number of factors.
This metric reflects the commonality of factor returns:
- Higher values indicate:
- Using this factor for stock selection has performed better recently.
- Higher centrality increases the risk of future mean reversion; close monitoring is needed.
- Lower values indicate:
- Using this factor for stock selection has performed worse recently.
- Lower centrality means lower risk of future mean reversion.
- Consider monitoring the factor when crowding is low, and use it when crowding starts to rise (factor reversion effect).
from finlab.tools.factor_analysis import calc_centrality
plot_line(calc_centrality(calc_factor_return(features, labels), 12), title='Centrality')
In the chart above, the market cap factor currently has low crowding. You might consider switching to a different factor to make the strategy more stable.
Alternatively, you can closely monitor the market cap factor and use it when crowding starts to rise (factor reversion effect).
Factor Contribution (Shapley Values)
Shapley Values is a mathematical method for fairly quantifying each factor's contribution to portfolio returns.
We enumerate all possible factor combinations, calculate the return for each combination, and determine each factor's contribution to the returns.
We can even compute each factor's contribution at each point in time.
It is worth noting that the time complexity of Shapley Values computation is \(O(2^n)\), where \(n\) is the number of factors. Computation time can be long, so for a large number of factors, consider using alternative methods.
from finlab.tools.factor_analysis import calc_shapley_values
plot_line(calc_shapley_values(features, labels))
Factor Correlation (Information Coefficient)
Factor IC (Information Coefficient) is an important metric for measuring a factor's predictive ability, commonly used to evaluate the correlation between a factor and future returns.
The IC formula is:
where \(\text{corr}\) is the correlation coefficient (usually Pearson), \(\text{Factor Score}\) is the factor score, and \(\text{Future Return}\) is the asset return over a future period.
IC values typically range between -1 and 1:
- IC close to 1 indicates strong positive predictive ability for future returns.
- IC close to -1 indicates strong negative (contrarian) predictive ability for future returns.
- IC close to 0 indicates almost no predictive ability for future returns.
In quantitative stock selection, IC is a key basis for judging factor effectiveness. Factors with higher IC are generally more worth including in the portfolio.
from finlab.tools.factor_analysis import calc_ic
features_raw = feature.combine({
'marketcap': -marketcap, # Small cap (negated)
'revenue': revenue.average(3) / revenue.average(12),
'momentum': price / price.shift(20)
}, resample='ME')
plot_line(calc_ic(features_raw, labels, rank=True))
Factor Trend Analysis
Overview
The "centrality", "contribution", and "correlation" values changing over time can be analyzed for trends using calc_regression_stats:
from finlab.tools.factor_analysis import calc_regression_stats
centrality_df = calc_centrality(features, labels)
centrality_trend = calc_regression_stats(centrality_df)
| slope | p_value | r_squared | tail_estimate | trend | |
|---|---|---|---|---|---|
| marketcap | -0.000111 | 3.102740e-17 | 0.404468 | 0.0123 | down |
| revenue | 0.000018 | 4.861743e-03 | 0.056041 | 0.0087 | flat |
| momentum | 0.000093 | 1.146515e-17 | 0.412914 | 0.0215 | up |
Core Fields
| Field | Description | Value Range | Meaning |
|---|---|---|---|
| slope | Linear regression slope | (-inf, +inf) | Trend direction and strength |
| p_value | Statistical significance | [0, 1] | Trend credibility |
| r_squared | Coefficient of determination | [0, 1] | Explanatory power of the linear model |
| tail_estimate | Tail estimate value | (-inf, +inf) | Predicted value at the end of the time series |
| trend | Trend classification | "up"/"down"/"flat" | Simplified trend judgment |
Value Ranges and Interpretations
slope
- Positive: Upward trend, increasing crowding
- Negative: Downward trend, decreasing crowding
- Absolute value: Trend strength (larger = stronger)
p_value (Statistical Significance)
- [0, 0.01]: Highly significant (***)
- [0.01, 0.05]: Significant (**)
- [0.05, 0.1]: Marginally significant (*)
- [0.1, 1]: Not significant
r_squared (Coefficient of Determination)
- [0.8, 1.0]: Very strong explanatory power
- [0.6, 0.8): Strong explanatory power
- [0.4, 0.6): Moderate explanatory power
- [0.2, 0.4): Weak explanatory power
- [0, 0.2): Very weak explanatory power
Trend Combination Analysis
| p_value | r_squared | slope | trend | Meaning |
|---|---|---|---|---|
| small | high | positive | up | Strong and stable upward trend |
| small | high | negative | down | Strong and stable downward trend |
| small | low | any | flat | Trend exists but effect is small / high noise |
| large | high | any | flat | Small sample, high noise, cannot determine |
| large | low | any | flat | Essentially no trend and model has no explanatory power |
Case Studies
Marketcap (Market Cap Factor)
slope: -0.000111 (negative)
p_value: 3.10e-17 (highly significant)
r_squared: 0.40 (moderate to strong explanatory power)
trend: down
Interpretation: The market cap factor's centrality shows a very strong and statistically highly significant downward trend.
Revenue (Revenue Factor)
slope: 0.000018 (positive)
p_value: 0.0048 (significant)
r_squared: 0.056 (weak explanatory power)
trend: flat (because r_squared < 0.1)
Interpretation: The revenue factor has a statistically significant upward trend, but due to insufficient explanatory power (r_squared < 0.1), it is classified as flat.
Momentum (Momentum Factor)
slope: 0.000093 (significantly positive)
p_value: 1.14e-17 (highly significant)
r_squared: 0.41 (moderate to strong explanatory power)
trend: up
Interpretation: The momentum factor's centrality shows a very strong and statistically highly significant upward trend. This is a clear signal that the momentum factor is very "popular" and a large amount of capital is chasing this strategy.