Factor Crowding Analysis (US Market)

Analyze factors systematically -- add, remove, and modify factors based on data rather than intuition. Below is the factor analysis workflow provided by the FinLab package:

Raw Data
    |
    |--- Feature Engineering -----> Factor Features
    |
    +--- Label Engineering -------> Excess Returns (Labels)
                |
                |--- calc_factor_return() -----> Factor Return
                |      |
                |      +-- calc_centrality() --> Factor Centrality
                |
                |--- calc_shapley_values() ----> Factor Contribution
                |
                +--- calc_ic() ----------------> Factor IC

Generate features and labels: features are the factors, labels are excess stock returns.
Compute factor returns: calc_factor_return(features, labels)
Compute factor centrality (crowding): calc_centrality(factor_return, window)
Compute factor contribution: calc_shapley_values(features, labels)
Compute factor IC: calc_ic(features, labels)
Compute factor Rank IC: calc_ic(features, labels, rank=True)

Example Strategy

This strategy uses three fundamental factors applied to the US market:

Market Capitalization -- small-cap tilt
Revenue Growth -- short-term vs. long-term revenue trend
Momentum -- 20-day price return

First, let us look at the strategy performance:

from finlab import data
from finlab.backtest import sim
from finlab.market import USMarket

data.set_market('us')

close = data.get('price:adj_close')
close = close[close.index.dayofweek < 5]  # remove weekends

revenue = data.get('us_income_statement:revenue')
marketcap = data.get('us_key_metrics:marketCap')

cond1 = marketcap.rank(pct=True, axis=1) < 0.3
cond2 = (revenue.average(3) / revenue.average(12)).rank(pct=True, axis=1) > 0.7
cond3 = (close / close.shift(20)).rank(pct=True, axis=1) > 0.7

pos = cond1 & cond2 & cond3

report = sim(pos, resample='ME', market=USMarket(), fee_ratio=0.001, tax_ratio=0)
report.creturn.plot()

Generating Features and Labels

Use feature.combine() to consolidate multiple factor conditions into a feature matrix, and label.excess_over_mean() to compute forward one-month excess returns as labels.

from finlab import data
import finlab.ml.feature as feature
import finlab.ml.label as label

features = feature.combine({
    'marketcap': cond1,
    'revenue': cond2,
    'momentum': cond3
}, resample='ME')

# Compute forward 1-month labels
labels = label.excess_over_mean(index=features.index, resample='ME')

features.dropna().head()

labels.dropna().head()

Factor Return

Factor return measures the excess return attributable to a given factor over a specified period. It is commonly used to evaluate the effectiveness and investment value of a factor.

In this workflow, features are constructed from different factors (market cap, revenue growth, momentum, etc.), and the factor return is computed for each. By plotting the cumulative factor return over time, you can observe the long-term performance and stability of each factor, which serves as a key input for factor selection and portfolio construction.

from finlab.tools.factor_analysis import calc_factor_return
from finlab.plot import plot_line

plot_line(
    calc_factor_return(features, labels).cumsum(),
    unit='.0%',
    title='Factor Cumulative Return'
)

From the chart, you can observe which factors have stronger cumulative returns and which have been less effective.

Factor Centrality

Factor centrality quantifies how "crowded" a factor has become -- it measures the commonality of factor returns.

Factor centrality is defined as:

\[ \text{Centrality}_i = \frac{\lambda_i}{ \sum_{j=1}^{k}\lambda_j} \]

where \(\lambda_j\) is the factor's loading on the first principal component, and \(k\) is the total number of factors.

This metric reflects the co-movement of factor returns:

Higher centrality indicates:
- The factor has performed well recently in stock selection.
- High crowding increases the risk of future drawdowns -- monitor closely.
Lower centrality indicates:
- The factor has underperformed recently in stock selection.
- Lower crowding means less risk of a sharp reversal.
- Watch for centrality to rise as a signal to re-enter the factor.

from finlab.tools.factor_analysis import calc_centrality
from finlab.plot import plot_line

plot_line(
    calc_centrality(calc_factor_return(features, labels), 12),
    title='Centrality'
)

When a factor's centrality is low, consider rotating to other factors for more stable performance. Alternatively, monitor the low-centrality factor and re-enter when its crowding begins to increase.

Shapley Values

Shapley values quantify each factor's contribution to portfolio returns using a fair allocation method from cooperative game theory.

The approach enumerates all possible factor combinations, computes the return of each combination, and calculates each factor's marginal contribution. This can be computed for each factor at each point in time, providing a dynamic view of factor importance.

Note that computing Shapley values has time complexity \(O(2^n)\) where \(n\) is the number of factors, so computation time grows exponentially. For a large number of factors, consider alternative attribution methods.

from finlab.tools.factor_analysis import calc_shapley_values
from finlab.plot import plot_line

plot_line(calc_shapley_values(features, labels))

Factor IC (Information Coefficient)

The Information Coefficient (IC) measures a factor's predictive power by computing the correlation between factor scores and subsequent returns.

The IC formula is:

\[ IC = \text{corr}(\text{Factor Score}, \text{Future Return}) \]

where \(\text{corr}\) denotes the correlation coefficient (typically Pearson or Spearman rank), \(\text{Factor Score}\) is the factor value, and \(\text{Future Return}\) is the asset's return over the subsequent period.

IC values range from -1 to 1:

IC close to 1: The factor has strong positive predictive power for future returns.
IC close to -1: The factor has strong negative (contrarian) predictive power.
IC close to 0: The factor has little to no predictive power.

In quantitative stock selection, IC is a primary metric for evaluating factor effectiveness. Factors with higher IC are generally more valuable for inclusion in a portfolio.

Note: When computing IC, features should be continuous numerical values rather than boolean conditions, so we redefine the features here:

from finlab.tools.factor_analysis import calc_ic

features = feature.combine({
    'marketcap': -marketcap,                             # small-cap tilt (negative sign)
    'revenue': revenue.average(3) / revenue.average(12),
    'momentum': close / close.shift(20)
}, resample='ME')

plot_line(calc_ic(features, labels, rank=True))

Key Parameters

resample='ME': Month-end rebalancing frequency for both the strategy and the factor analysis.
calc_factor_return(features, labels): Computes the return attributable to each factor by comparing the performance of stocks in the top vs. bottom quantile of each factor.
calc_centrality(factor_return, 12): Uses a 12-period rolling window to compute PCA-based centrality, reflecting the recent crowding level.
calc_shapley_values(features, labels): Enumerates all \(2^n\) factor subsets to fairly attribute returns. Keep \(n\) small (under 8-10 factors).
calc_ic(features, labels, rank=True): Computes the Spearman rank IC when rank=True, which is more robust to outliers than Pearson IC.

Expected Behavior

Factor analysis helps you understand why a strategy works and when it might stop working. Factor returns reveal long-term efficacy, centrality warns of crowding risk, Shapley values decompose contributions fairly, and IC measures raw predictive power. Used together, these tools enable data-driven factor rotation: reduce exposure to crowded factors and increase exposure to factors with improving IC and low centrality. In the US market, common factors like size, value, and momentum experience regime shifts, making this type of monitoring essential for adaptive strategy management.