Skip to content

finlab.dataframe

FinlabDataFrame extends Pandas DataFrame with automatic data alignment, technical indicator computation, and stock selection utilities.

Use Cases

  • Automatically align data of different frequencies (daily, weekly, monthly, quarterly)
  • Quickly perform stock selection (ranking, top-N filtering)
  • Compute moving averages and technical indicators
  • Manage entry/exit signals (hold until stop-loss/take-profit)
  • Industry neutralization
  • Cross-sectional factor preprocessing (df.cs: rank, winsorize, z-score)
  • Within-sector transforms and aggregation (df.sector: sector rank, mean, std)
  • Portfolio weight construction (df.weight: inverse volatility, risk parity, drawdown control)

Quick Examples

Auto-Alignment and Stock Selection

from finlab import data

# data.get() returns FinlabDataFrame
close = data.get('price:收盤價')  # Daily
revenue = data.get('monthly_revenue:當月營收')  # Monthly

# Automatic date alignment (revenue is forward-filled to daily frequency)
position = (close > close.average(20)) & (revenue > 1e8)

Ranking and Filtering

from finlab import data

close = data.get('price:收盤價')
marketcap = data.get('etl:market_value')

# Select the 30 smallest by market cap
small_cap = marketcap.is_smallest(30)

# Select the 20 largest by price momentum
momentum = (close / close.shift(20)).is_largest(20)

# Combine strategies
position = small_cap & momentum

Position Management

from finlab import data
from finlab.backtest import sim

close = data.get('price:收盤價')
position = close > close.average(20)

# Hold until 10% stop-loss or 20% take-profit
position_with_stops = position.hold_until(stop_loss=0.1, take_profit=0.2)

# Backtest
report = sim(position_with_stops, resample='M')
report.display()

FinlabDataFrame vs pandas DataFrame

Feature pandas DataFrame FinlabDataFrame
Basic operations Fully supported Fully supported (inherits pandas)
Auto-alignment Requires manual handling Automatically aligns different frequency data
Stock selection Write your own Built-in is_largest(), is_smallest()
Technical indicators Requires extra packages Built-in average(), rise(), fall()
Position management Write your own Built-in hold_until(), exit_when()
Industry analysis Write your own Built-in industry_rank(), neutralize_industry()
Factor preprocessing Write your own Built-in df.cs cross-sectional transforms
Within-sector analysis Write your own Built-in df.sector sector transforms & aggregation
Weight construction Write your own Built-in df.weight with 8 weighting methods

API Reference

FinlabDataFrame

finlab.dataframe.FinlabDataFrame

FinlabDataFrame(*args, **kwargs)

Bases: IndexConversionMixin, SignalMixin, NeutralizeMixin, IndustryMixin, DataFrame

回測語法糖 除了使用熟悉的 Pandas 語法外,我們也提供很多語法糖,讓大家開發程式時,可以用簡易的語法完成複雜的功能,讓開發策略更簡潔! 我們將所有的語法糖包裹在 FinlabDataFrame 中,用起來跟 pd.DataFrame 一樣,但是多了很多功能! 只要使用 finlab.data.get() 所獲得的資料,皆為 FinlabDataFrame 格式, 接下來我們就來看看, FinlabDataFrame 有哪些好用的語法糖吧!

當資料日期沒有對齊(例如: 財報 vs 收盤價 vs 月報)時,在使用以下運算符號: +, -, *, /, >, >=, ==, <, <=, &, |, ~, 不需要先將資料對齊,因為 FinlabDataFrame 會自動幫你處理,以下是示意圖。

steps

以下是範例:cond1cond2 分別為「每天」,和「每季」的資料,假如要取交集的時間,可以用以下語法:

from finlab import data
# 取得 FinlabDataFrame
close = data.get('price:收盤價')
roa = data.get('fundamental_features:ROA稅後息前')

# 運算兩個選股條件交集
cond1 = close > 37
cond2 = roa > 0
cond_1_2 = cond1 & cond2
擷取 1101 台泥 的訊號如下圖,可以看到 cond1cond2 訊號的頻率雖然不相同, 但是由於 cond1cond2FinlabDataFrame,所以可以直接取交集,而不用處理資料頻率對齊的問題。
imageconds

總結來說,FinlabDataFrame 與一般 dataframe 唯二不同之處: 1. 多了一些 method,如df.is_largest(), df.sustain()...等。 2. 在做四則運算、不等式運算前,會將 df1、df2 的 index 取聯集,column 取交集。

average

average(n)

取 n 筆移動平均

若股票在時間窗格內,有 N/2 筆 NaN,則會產生 NaN。 Args: n (positive-int): 設定移動窗格數。 Returns: (pd.DataFrame): data Examples: 股價在均線之上

from finlab import data
close = data.get('price:收盤價')
sma = close.average(10)
cond = close > sma
只需要簡單的語法,就可以將其中一部分的訊號繪製出來檢查:
import matplotlib.pyplot as plt

close.loc['2021', '2330'].plot()
sma.loc['2021', '2330'].plot()
cond.loc['2021', '2330'].mul(20).add(500).plot()

plt.legend(['close', 'sma', 'cond'])
sma

fall

fall(n=1)

數值下降中

取是否比前第n筆低,若符合條件的值則為True,反之為False。 Screen-Shot-2021-10-26-at-6-43-41-AM Args: n (positive-int): 設定比較前第n筆低。 Returns: (pd.DataFrame): data Examples: 收盤價是否低於10日前股價

from finlab import data
data.get('price:收盤價').fall(10)

is_largest

is_largest(n)

取每列前 n 筆大的數值

若符合 True ,反之為 False 。用來篩選每天數值最大的股票。

is-largest Args: n (positive-int): 設定每列前 n 筆大的數值。 Returns: (pd.DataFrame): data Examples: 每季 ROA 前 10 名的股票

from finlab import data

roa = data.get('fundamental_features:ROA稅後息前')
good_stocks = roa.is_largest(10)

is_smallest

is_smallest(n)

取每列前 n 筆小的數值

若符合 True ,反之為 False 。用來篩選每天數值最小的股票。 Args: n (positive-int): 設定每列前 n 筆小的數值。 Returns: (pd.DataFrame): data Examples: 股價淨值比最小的 10 檔股票

from finlab import data

pb = data.get('price_earning_ratio:股價淨值比')
cheap_stocks = pb.is_smallest(10)

lazy

lazy()

Wrap this DataFrame into a lazy computation handle.

Returns a LazyWide proxy that records operations without executing them. Call .rebalance(freq) or .collect() on the result to trigger execution at specific dates only.

Example::

close = data.get('price:收盤價')
position = (
    close.lazy()
    .rolling(60).mean()
    .rank(axis=1, pct=True)
    .is_largest(10)
    .rebalance('M')
)

quantile_row

quantile_row(c)

股票當天數值分位數

取得每列c定分位數的值。 Args: c (positive-int): 設定每列 n 定分位數的值。 Returns: (pd.DataFrame): data Examples: 取每日股價前90%分位數

from finlab import data
data.get('price:收盤價').quantile_row(0.9)

rank

rank(*args, valid=None, **kwargs)

Cross-sectional or time-series ranking.

PARAMETER DESCRIPTION
valid

只有 valid 為 True 的 cell 才參與排名。 valid 為 False/NaN 的 cell 在排名前設為 NaN, 因此不影響 pct=True 的分母。 常見用法:fillna 後傳入原始 notna() mask, 避免新上市或歷史不足的股票污染百分位排名。

TYPE: DataFrame or Series of bool DEFAULT: None

rebalance

rebalance(freq)

Downsample to keep only the last observation per period.

Equivalent to self.resample(freq).last().

PARAMETER DESCRIPTION
freq

Pandas frequency string, e.g. 'ME' (month-end), 'W' (weekly), 'QE' (quarter-end), or explicit dates to select with reindex.

TYPE: str | Index | list

rise

rise(n=1)

數值上升中

取是否比前第n筆高,若符合條件的值則為True,反之為False。 Screen-Shot-2021-10-26-at-6-43-41-AM Args: n (positive-int): 設定比較前第n筆高。 Returns: (pd.DataFrame): data Examples: 收盤價是否高於10日前股價

from finlab import data
data.get('price:收盤價').rise(10)

Key FinlabDataFrame Methods:

Stock Selection and Ranking

Method Description Example
is_largest(n) Select the N largest values marketcap.is_largest(30)
is_smallest(n) Select the N smallest values pe_ratio.is_smallest(20)
rank() Compute ranking (supports industry ranking) close.rank(pct=True)
industry_rank() Ranking within industry close.industry_rank()

Technical Indicators

Method Description Example
average(n) N-period moving average close.average(20)
rise(n) Consecutive N-period rise close.rise(3)
fall(n) Consecutive N-period fall close.fall(2)
sustain(n) Condition sustained for N periods (close > ma20).sustain(5)

Position Management

Method Description Example
hold_until() Hold until stop-loss/take-profit position.hold_until(stop_loss=0.1)
exit_when() Exit on condition position.exit_when(close < ma20)
entry_price() Get entry price position.entry_price('close')

Industry Analysis

Method Description Example
neutralize_industry() Industry neutralization factor.neutralize_industry()
groupby_category() Group by industry close.groupby_category().mean()

average()

Compute moving average.

Usage Examples:

from finlab import data

close = data.get('price:收盤價')

# 20-day moving average
ma20 = close.average(20)

# 60-day moving average
ma60 = close.average(60)

# Moving average crossover strategy
golden_cross = ma20 > ma60

Difference from pandas rolling()

  • average(20) is equivalent to rolling(20).mean()
  • More concise and readable
  • Return value is still a FinlabDataFrame, allowing method chaining

is_largest() / is_smallest()

Select top/bottom N by ranking.

Usage Examples:

from finlab import data

close = data.get('price:收盤價')
marketcap = data.get('etl:market_value')

# Select the 30 smallest by market cap (small-cap stocks)
small_cap = marketcap.is_smallest(30)

# Select the 20 largest by momentum (momentum stocks)
momentum = (close / close.shift(20)).is_largest(20)

# Combine: momentum stocks among small-caps
position = small_cap & momentum

Notes

  • is_largest(n) re-ranks every period (daily/monthly)
  • If fewer than N stocks are available, all are selected
  • Returns a Boolean DataFrame (True = selected)

rank()

Compute ranking (supports percentile ranking).

Usage Examples:

from finlab import data

close = data.get('price:收盤價')
marketcap = data.get('etl:market_value')

# Compute percentile ranking (0-1)
cap_rank = marketcap.rank(pct=True, axis=1)

# Select top 30% by market cap
position = cap_rank > 0.7

# Or use is_largest (more intuitive)
position = marketcap.is_largest(int(len(marketcap.columns) * 0.3))

industry_rank()

Ranking within industry.

Usage Examples:

from finlab import data

close = data.get('price:收盤價')

# Compute within-industry ranking (0-1)
industry_rank = close.industry_rank()

# Select top 30% performers within each industry
position = industry_rank > 0.7

Industry Classification Source

  • Uses TEJ industry classification
  • Automatically handles industry reclassification

rise() / fall()

Detect consecutive rises/falls.

Usage Examples:

from finlab import data

close = data.get('price:收盤價')

# 3 consecutive days of rising
rising = close.rise(3)

# 2 consecutive days of falling
falling = close.fall(2)

# Strategy: reverse signal after consecutive rises
position = ~rising  # Inverse signal

sustain()

Check if a condition is sustained for N periods.

Usage Examples:

from finlab import data

close = data.get('price:收盤價')
ma20 = close.average(20)

# Price stays above 20-day MA for 5 consecutive days
sustained_above_ma = (close > ma20).sustain(5)

# Strategy
position = sustained_above_ma

sustain vs rise

  • rise(n): value increases for n consecutive periods
  • sustain(n): condition is True for n consecutive periods

hold_until()

Hold position until stop-loss/take-profit triggers.

Usage Examples:

from finlab import data
from finlab.backtest import sim

close = data.get('price:收盤價')
position = close > close.average(20)

# Basic usage: 10% stop-loss
position_with_sl = position.hold_until(stop_loss=0.1)

# 10% stop-loss + 20% take-profit
position_with_stops = position.hold_until(stop_loss=0.1, take_profit=0.2)

# Backtest
report = sim(position_with_stops, resample='M')
report.display()

# Trailing stop requires setting in sim()
report = sim(position, resample='M', trail_stop=0.2)

Key Concepts

  • hold_until() extends the holding period until the trigger condition
  • Stop-loss/take-profit are calculated based on the entry price
  • Trailing stop must be set via sim(trail_stop=...), exits when price drops a specified percentage from the peak

exit_when()

Exit when a specific condition is met.

Usage Examples:

from finlab import data

close = data.get('price:收盤價')
ma20 = close.average(20)
ma60 = close.average(60)

# Entry: short MA crosses above long MA
entry = ma20 > ma60

# Exit: short MA crosses below long MA
exit_signal = ma20 < ma60

# Hold until exit signal
position = entry.exit_when(exit_signal)

exit_when vs hold_until

  • exit_when(): Custom exit conditions (e.g., technical indicators)
  • hold_until(): Fixed stop-loss/take-profit percentages

neutralize_industry()

Industry neutralization - removes industry factor effects.

Usage Examples:

from finlab import data

# Raw factor (with industry influence)
factor = data.get('fundamental_features:股東權益報酬率')

# Industry neutralization (removes industry average)
neutral_factor = factor.neutralize_industry()

# Use neutralized factor
position = neutral_factor.is_largest(30)

Why Neutralize?

  • Some industries naturally have higher ROE (e.g., financial sector)
  • Neutralization avoids over-concentration in a single industry
  • Improves strategy stability

Portfolio Weight Construction df.weight

The df.weight accessor provides methods for constructing and adjusting portfolio weights. All methods return FinlabDataFrame and support method chaining.

Method Overview

Method Description Example
cap_industry(max_weight) Cap industry weight, redistribute excess proportionally w.weight.cap_industry(0.3)
clip_by_volume(total_fund, max_participation_ratio, volume, window) Clip stock weight by average daily volume w.weight.clip_by_volume(1e7)
inverse_volatility(window, price) Inverse volatility weighting — low-vol stocks get higher weight w.weight.inverse_volatility(60)
risk_parity(window, price) Risk parity weighting — equal risk contribution per holding w.weight.risk_parity(60)
correlation(window, price, diversify) Adjust weights by correlation — diversify or concentrate w.weight.correlation(diversify=True)
target_volatility(target, window, price) Scale weights to achieve target annualized volatility w.weight.target_volatility(0.15)
limit_turnover(max_turnover) Cap two-way turnover between rebalances w.weight.limit_turnover(0.3)
drawdown_control(max_drawdown, price) Reduce exposure when drawdown exceeds threshold w.weight.drawdown_control(0.15)

Usage Example

from finlab import data
from finlab.backtest import sim

close = data.get('price:收盤價')
marketcap = data.get('etl:market_value')

# Stock selection
position = marketcap.is_smallest(30)

# Chain weight construction
weighted = (
    position
    .weight.inverse_volatility(window=60)     # Inverse volatility weighting
    .weight.cap_industry(max_weight=0.3)      # Industry cap at 30%
    .weight.limit_turnover(max_turnover=0.3)  # Turnover cap at 30%
    .weight.drawdown_control(max_drawdown=0.15) # Drawdown control at 15%
)

report = sim(weighted, resample='M')
report.display()

Method ordering

  • Apply weighting first (inverse_volatility / risk_parity), then constraints (cap_industry / clip_by_volume)
  • limit_turnover and drawdown_control are typically applied last

Cross-Sectional Transforms df.cs

The df.cs (cross-sectional) accessor provides transforms computed across all stocks on each date.

Method Overview

Method Description Example
rank() Percentile rank (0, 1] factor.cs.rank()
winsorize(lower, upper) Percentile winsorization factor.cs.winsorize(0.05, 0.95)
bucket(n) Equal-quantile bucketing (1..n) factor.cs.bucket(5)
zscore() Standardization (mean 0, std 1) factor.cs.zscore()
demean() Subtract cross-sectional mean factor.cs.demean()

Usage Example

from finlab import data

roe = data.get('fundamental_features:股東權益報酬率')

# Factor preprocessing: winsorize + standardize
clean_roe = roe.cs.winsorize(0.05, 0.95).cs.zscore()

# Bucket into 5 groups for backtesting
group = roe.cs.bucket(5)
long_group = group == 5  # Long the top quintile

# Percentile rank stock selection
position = roe.cs.rank() > 0.8  # Select top 20%

Sector-Relative Transforms & Aggregation df.sector

The df.sector accessor provides two categories of functionality:

  1. Within-sector transforms: Same 5 methods as df.cs (rank / winsorize / bucket / zscore / demean), computed within each sector
  2. Within-sector aggregation: mean / std / median / sum / min / max / count — sector-level statistics

Transform Methods

Method Description Example
rank() Within-sector percentile rank factor.sector.rank()
winsorize(lower, upper) Within-sector winsorization factor.sector.winsorize(0.05, 0.95)
bucket(n) Within-sector bucketing factor.sector.bucket(3)
zscore() Within-sector standardization factor.sector.zscore()
demean() Within-sector demeaning factor.sector.demean()

Aggregation Methods

Method Description Example
mean() Sector mean roe.sector.mean()
std(ddof=1) Sector standard deviation roe.sector.std()
median() Sector median roe.sector.median()
sum() Sector sum revenue.sector.sum()
min() Sector minimum close.sector.min()
max() Sector maximum close.sector.max()
count() Count of non-NaN values in sector roe.sector.count()

Usage Example

from finlab import data

roe = data.get('fundamental_features:股東權益報酬率')
close = data.get('price:收盤價')

# Within-sector rank (top 20% ROE within each sector)
position = roe.sector.rank() > 0.8

# Within-sector standardization (removes sector bias)
neutralized_roe = roe.sector.zscore()

# Compare against sector average
sector_avg_roe = roe.sector.mean()
above_avg = roe > sector_avg_roe  # Stocks above sector average

sector vs neutralize_industry()

  • df.sector.zscore() achieves the same effect as neutralize_industry() (industry neutralization)
  • df.sector offers more flexibility: ranking, bucketing, aggregation, and more

Complete Example: Multi-Filter Strategy

from finlab import data
from finlab.backtest import sim

# Step 1: Load data
close = data.get('price:收盤價')
marketcap = data.get('etl:market_value')
revenue = data.get('monthly_revenue:當月營收')
roe = data.get('fundamental_features:股東權益報酬率')

# Step 2: Define filter conditions
cond1 = marketcap.is_smallest(100)  # Small-cap
cond2 = (revenue.average(3) / revenue.average(12)) > 1.2  # Revenue growth
cond3 = roe.rank(pct=True, axis=1) > 0.7  # Top 30% ROE
cond4 = (close > close.average(20)).sustain(3)  # Above 20MA for 3 days

# Step 3: Combine conditions
position = cond1 & cond2 & cond3 & cond4

# Step 4: Add stop-loss/take-profit
position = position.hold_until(stop_loss=0.1, take_profit=0.2)

# Step 5: Limit number of holdings
position = position.is_largest(20)  # Max 20 stocks

# Step 6: Backtest
report = sim(position, resample='M')
report.display()

# Step 7: Analyze performance
metrics = report.get_metrics()
print(f"Annual return: {metrics['annual_return']:.2%}")
print(f"Sharpe ratio: {metrics['daily_sharpe']:.2f}")
print(f"Max drawdown: {metrics['max_drawdown']:.2%}")

FAQ

Q: How does FinlabDataFrame automatically align different frequency data?

from finlab import data

# Daily data
close = data.get('price:收盤價')  # Updated daily
print(close.index[0])  # 2010-01-04

# Monthly data
revenue = data.get('monthly_revenue:當月營收')  # Published around the 10th of each month
print(revenue.index[0])  # 2010-01-10

# Direct operations auto-align (revenue is forward-filled to daily frequency)
position = (close > 100) & (revenue > 1e8)
# position is daily data, revenue has been auto-filled

Alignment Rules: - Low frequency to high frequency: Forward fill - Quarterly reports (financial statements): Automatically delayed to the financial statement announcement date

Q: How do I chain multiple FinlabDataFrame methods?

from finlab import data

close = data.get('price:收盤價')

# Method chaining (similar to pandas method chaining)
position = (
    close
    .average(20)           # Compute 20MA
    .rank(pct=True, axis=1)  # Rank
    .is_largest(30)        # Select top 30
)

# Equivalent to
ma20 = close.average(20)
ranked = ma20.rank(pct=True, axis=1)
position = ranked.is_largest(30)

Q: What is the difference between is_largest() and rank() + threshold?

from finlab import data

close = data.get('price:收盤價')

# Method 1: is_largest (fixed count)
position1 = close.is_largest(30)  # Always selects 30 stocks

# Method 2: rank + threshold (fixed percentage)
position2 = close.rank(pct=True, axis=1) > 0.8  # Selects top 20% (count varies)

# Difference:
# - Method 1: Fixed number of holdings (suitable for fixed capital allocation)
# - Method 2: Variable number of holdings (adapts to market conditions)

Q: How does trailing stop (trail_stop) work?

Trailing stop must be set via the trail_stop parameter in sim() -- hold_until() does not support this feature.

from finlab import data
from finlab.backtest import sim

close = data.get('price:收盤價')
position = close > close.average(20)

# Example: Entry price is 100
# - stop_loss=0.1: Exit if price drops below 90
# - take_profit=0.2: Exit when price reaches 120 (fixed take-profit)
# - trail_stop=0.2: Exit when price drops 20% from peak (trailing stop)

# Scenario: Entry at 100 -> rises to 150 -> drops to 120
# - take_profit=0.2: Exits at 120, does not wait for 150
# - trail_stop=0.2: Exits when price drops 20% from peak 150 = 120

# Fixed take-profit
report_fixed = sim(position, resample='M', stop_loss=0.1, take_profit=0.2)

# Trailing stop
report_trailing = sim(position, resample='M', stop_loss=0.1, trail_stop=0.2)

Q: How do I use pandas methods on FinlabDataFrame?

from finlab import data

close = data.get('price:收盤價')

# FinlabDataFrame is fully compatible with pandas
close.rolling(20).mean()  # pandas method
close.fillna(0)           # pandas method
close.dropna()            # pandas method

# All pandas methods work
# Results remain FinlabDataFrame (retaining extended features)

Resources