Skip to content

Analyzing Strategy Factors

Research factor analysis to add, remove, or modify factors based on data rather than intuition. Understand the relationship between factors and strategy returns.

Factor Analysis Workflow in FinLab Package:

Example Strategy
    |
    +--- Feature Engineering ----> Factor Features (Features)
    |
    +--- Label Engineering  ----> Excess Returns (Labels)
                |
                +--- calc_factor_return() ----> Factor Return
                |      |
                |      +-- calc_centrality() --> Factor Centrality
                |
                +--- calc_shapley_values() ---> Factor Contribution
                |
                +--- calc_ic() ----------------> Factor Correlation (IC)

We will demonstrate how to:

  1. Convert an "Example Strategy" into "Features" and "Labels"
  2. Use the above data to analyze factor performance

Example Strategy

This strategy is composed of three basic factors: 1. Market capitalization 2. Revenue 3. Momentum

from finlab import data
from finlab.backtest import sim

marketcap = data.get('etl:market_value')
revenue = data.get('monthly_revenue:當月營收')
price = data.get('etl:adj_close')

cond1 = marketcap.rank(pct=True, axis=1) < 0.3
cond2 = (revenue.average(3) / revenue.average(12)).rank(pct=True, axis=1) > 0.7
cond3 = (price / price.shift(20)).rank(pct=True, axis=1) > 0.7

pos = cond1 & cond2 & cond3

report = sim(pos, resample='ME', upload=False)
report.creturn.plot()

Creating Features and Labels

from finlab import data
import finlab.ml.feature as feature
import finlab.ml.label as label

features =  feature.combine({
    'marketcap': cond1,
    'revenue': cond2,
    'momentum': cond3
}, resample='ME')

# Compute labels for the next 1 month
labels = label.excess_over_mean(index=features.index, resample='ME')

features.dropna().head()
datetime instrument marketcap revenue momentum
2013-04-30 1101 False True True
2013-04-30 1102 False True True
2013-04-30 1103 False True False
2013-04-30 1104 False True False
2013-04-30 1108 False True False
labels.dropna().head()
datetime    instrument
2007-04-30  1101          0.042548
            1102          0.081764
            1103         -0.016863
            1104         -0.047492
            1108         -0.002247
dtype: float64

Factor Return

Factor Return measures the excess return brought by a factor over a period of time, commonly used to evaluate a factor's effectiveness and investment value.

In this analysis workflow, features are built based on different factors (such as market cap, revenue growth, momentum), and the factor return for each is calculated. By observing the cumulative factor return trend, you can assess the factor's long-term performance and stability, serving as an important basis for factor selection and portfolio construction.

The chart below shows the cumulative return for each factor.

from finlab.tools.factor_analysis import calc_factor_return
from finlab.plot import plot_line

plot_line(calc_factor_return(features, labels).cumsum(), unit='.0%', title='Factor Cumulative Return')

From the chart, we can observe that the monthly revenue factor performs significantly better, while the market cap factor performs worse.

Factor Centrality

Factor Centrality is a metric used to quantify "commonality" among factors.

Factor Centrality is defined as:

\[ \text{Centrality}_i = \frac{\lambda_i}{ \sum_{j=1}^{k}\lambda_j} \]

where \(\lambda_j\) is the factor's contribution to the first principal component, and \(k\) is the total number of factors.

This metric reflects the commonality of factor returns:

  • Higher values indicate:
    • Using this factor for stock selection has performed better recently.
    • Higher centrality increases the risk of future mean reversion; close monitoring is needed.
  • Lower values indicate:
    • Using this factor for stock selection has performed worse recently.
    • Lower centrality means lower risk of future mean reversion.
    • Consider monitoring the factor when crowding is low, and use it when crowding starts to rise (factor reversion effect).
from finlab.tools.factor_analysis import calc_centrality


plot_line(calc_centrality(calc_factor_return(features, labels), 12), title='Centrality')

In the chart above, the market cap factor currently has low crowding. You might consider switching to a different factor to make the strategy more stable.

Alternatively, you can closely monitor the market cap factor and use it when crowding starts to rise (factor reversion effect).

Factor Contribution (Shapley Values)

Shapley Values is a mathematical method for fairly quantifying each factor's contribution to portfolio returns.

We enumerate all possible factor combinations, calculate the return for each combination, and determine each factor's contribution to the returns.

We can even compute each factor's contribution at each point in time.

It is worth noting that the time complexity of Shapley Values computation is \(O(2^n)\), where \(n\) is the number of factors. Computation time can be long, so for a large number of factors, consider using alternative methods.

from finlab.tools.factor_analysis import calc_shapley_values


plot_line(calc_shapley_values(features, labels))

Factor Correlation (Information Coefficient)

Factor IC (Information Coefficient) is an important metric for measuring a factor's predictive ability, commonly used to evaluate the correlation between a factor and future returns.

The IC formula is:

\[ IC = \text{corr}(\text{Factor Score}, \text{Future Return}) \]

where \(\text{corr}\) is the correlation coefficient (usually Pearson), \(\text{Factor Score}\) is the factor score, and \(\text{Future Return}\) is the asset return over a future period.

IC values typically range between -1 and 1:

  • IC close to 1 indicates strong positive predictive ability for future returns.
  • IC close to -1 indicates strong negative (contrarian) predictive ability for future returns.
  • IC close to 0 indicates almost no predictive ability for future returns.

In quantitative stock selection, IC is a key basis for judging factor effectiveness. Factors with higher IC are generally more worth including in the portfolio.

from finlab.tools.factor_analysis import calc_ic

features_raw =  feature.combine({
    'marketcap': -marketcap, # Small cap (negated)
    'revenue': revenue.average(3) / revenue.average(12),
    'momentum': price / price.shift(20)
}, resample='ME')

plot_line(calc_ic(features_raw, labels, rank=True))

Factor Trend Analysis

Overview

The "centrality", "contribution", and "correlation" values changing over time can be analyzed for trends using calc_regression_stats:

from finlab.tools.factor_analysis import calc_regression_stats

centrality_df = calc_centrality(features, labels)
centrality_trend = calc_regression_stats(centrality_df)
slope p_value r_squared tail_estimate trend
marketcap -0.000111 3.102740e-17 0.404468 0.0123 down
revenue 0.000018 4.861743e-03 0.056041 0.0087 flat
momentum 0.000093 1.146515e-17 0.412914 0.0215 up

Core Fields

Field Description Value Range Meaning
slope Linear regression slope (-inf, +inf) Trend direction and strength
p_value Statistical significance [0, 1] Trend credibility
r_squared Coefficient of determination [0, 1] Explanatory power of the linear model
tail_estimate Tail estimate value (-inf, +inf) Predicted value at the end of the time series
trend Trend classification "up"/"down"/"flat" Simplified trend judgment

Value Ranges and Interpretations

slope
  • Positive: Upward trend, increasing crowding
  • Negative: Downward trend, decreasing crowding
  • Absolute value: Trend strength (larger = stronger)
p_value (Statistical Significance)
  • [0, 0.01]: Highly significant (***)
  • [0.01, 0.05]: Significant (**)
  • [0.05, 0.1]: Marginally significant (*)
  • [0.1, 1]: Not significant
r_squared (Coefficient of Determination)
  • [0.8, 1.0]: Very strong explanatory power
  • [0.6, 0.8): Strong explanatory power
  • [0.4, 0.6): Moderate explanatory power
  • [0.2, 0.4): Weak explanatory power
  • [0, 0.2): Very weak explanatory power

Trend Combination Analysis

p_value r_squared slope trend Meaning
small high positive up Strong and stable upward trend
small high negative down Strong and stable downward trend
small low any flat Trend exists but effect is small / high noise
large high any flat Small sample, high noise, cannot determine
large low any flat Essentially no trend and model has no explanatory power

Case Studies

Marketcap (Market Cap Factor)
slope: -0.000111 (negative)
p_value: 3.10e-17 (highly significant)
r_squared: 0.40 (moderate to strong explanatory power)
trend: down

Interpretation: The market cap factor's centrality shows a very strong and statistically highly significant downward trend.

Revenue (Revenue Factor)
slope: 0.000018 (positive)
p_value: 0.0048 (significant)
r_squared: 0.056 (weak explanatory power)
trend: flat (because r_squared < 0.1)

Interpretation: The revenue factor has a statistically significant upward trend, but due to insufficient explanatory power (r_squared < 0.1), it is classified as flat.

Momentum (Momentum Factor)
slope: 0.000093 (significantly positive)
p_value: 1.14e-17 (highly significant)
r_squared: 0.41 (moderate to strong explanatory power)
trend: up

Interpretation: The momentum factor's centrality shows a very strong and statistically highly significant upward trend. This is a clear signal that the momentum factor is very "popular" and a large amount of capital is chasing this strategy.