finlab.dataframe

FinlabDataFrame extends Pandas DataFrame with automatic data alignment, technical indicator computation, and stock selection utilities.

Use Cases

Automatically align data of different frequencies (daily, weekly, monthly, quarterly)
Quickly perform stock selection (ranking, top-N filtering)
Compute moving averages and technical indicators
Manage entry/exit signals (hold until stop-loss/take-profit)
Industry neutralization

Quick Examples

Auto-Alignment and Stock Selection

from finlab import data

# data.get() returns FinlabDataFrame
close = data.get('price:收盤價')  # Daily
revenue = data.get('monthly_revenue:當月營收')  # Monthly

# Automatic date alignment (revenue is forward-filled to daily frequency)
position = (close > close.average(20)) & (revenue > 1e8)

Ranking and Filtering

from finlab import data

close = data.get('price:收盤價')
marketcap = data.get('etl:market_value')

# Select the 30 smallest by market cap
small_cap = marketcap.is_smallest(30)

# Select the 20 largest by price momentum
momentum = (close / close.shift(20)).is_largest(20)

# Combine strategies
position = small_cap & momentum

Position Management

from finlab import data
from finlab.backtest import sim

close = data.get('price:收盤價')
position = close > close.average(20)

# Hold until 10% stop-loss or 20% take-profit
position_with_stops = position.hold_until(stop_loss=0.1, take_profit=0.2)

# Backtest
report = sim(position_with_stops, resample='M')
report.display()

FinlabDataFrame vs pandas DataFrame

Feature	pandas DataFrame	FinlabDataFrame
Basic operations	Fully supported	Fully supported (inherits pandas)
Auto-alignment	Requires manual handling	Automatically aligns different frequency data
Stock selection	Write your own	Built-in `is_largest()`, `is_smallest()`
Technical indicators	Requires extra packages	Built-in `average()`, `rise()`, `fall()`
Position management	Write your own	Built-in `hold_until()`, `exit_when()`
Industry analysis	Write your own	Built-in `industry_rank()`, `neutralize_industry()`

API Reference

FinlabDataFrame

finlab.dataframe.FinlabDataFrame

FinlabDataFrame(*args, **kwargs)

Bases: DataFrame

回測語法糖除了使用熟悉的 Pandas 語法外，我們也提供很多語法糖，讓大家開發程式時，可以用簡易的語法完成複雜的功能，讓開發策略更簡潔！我們將所有的語法糖包裹在 FinlabDataFrame 中，用起來跟 pd.DataFrame 一樣，但是多了很多功能！只要使用 finlab.data.get() 所獲得的資料，皆為 FinlabDataFrame 格式，接下來我們就來看看， FinlabDataFrame 有哪些好用的語法糖吧！

當資料日期沒有對齊（例如: 財報 vs 收盤價 vs 月報）時，在使用以下運算符號： +, -, *, /, >, >=, ==, <, <=, &, |, ~，不需要先將資料對齊，因為 FinlabDataFrame 會自動幫你處理，以下是示意圖。

steps

以下是範例：cond1 與 cond2 分別為「每天」，和「每季」的資料，假如要取交集的時間，可以用以下語法：

from finlab import data
# 取得 FinlabDataFrame
close = data.get('price:收盤價')
roa = data.get('fundamental_features:ROA稅後息前')

# 運算兩個選股條件交集
cond1 = close > 37
cond2 = roa > 0
cond_1_2 = cond1 & cond2

擷取 1101 台泥的訊號如下圖，可以看到 cond1 跟 cond2 訊號的頻率雖然不相同，但是由於 cond1 跟 cond2 是 FinlabDataFrame，所以可以直接取交集，而不用處理資料頻率對齊的問題。
imageconds

總結來說，FinlabDataFrame 與一般 dataframe 唯二不同之處： 1. 多了一些 method，如df.is_largest(), df.sustain()...等。 2. 在做四則運算、不等式運算前，會將 df1、df2 的 index 取聯集，column 取交集。

average

average(n)

取 n 筆移動平均

若股票在時間窗格內，有 N/2 筆 NaN，則會產生 NaN。 Args: n (positive-int): 設定移動窗格數。 Returns: (pd.DataFrame): data Examples: 股價在均線之上

from finlab import data
close = data.get('price:收盤價')
sma = close.average(10)
cond = close > sma

只需要簡單的語法，就可以將其中一部分的訊號繪製出來檢查：

import matplotlib.pyplot as plt

close.loc['2021', '2330'].plot()
sma.loc['2021', '2330'].plot()
cond.loc['2021', '2330'].mul(20).add(500).plot()

plt.legend(['close', 'sma', 'cond'])

deadline

deadline()

財務索引轉換成公告截止日

將財務季報 (ex:2022Q1) 從文字格式轉為公告截止日的datetime格式，通常使用情境為對不同週期的dataframe做reindex，常用於以公告截止日作為訊號產生日。 Returns: (pd.DataFrame): data Examples:

data.get('financial_statement:現金及約當現金').deadline()
data.get('monthly_revenue:當月營收').deadline()

fall

fall(n=1)

數值下降中

取是否比前第n筆低，若符合條件的值則為True，反之為False。 Screen-Shot-2021-10-26-at-6-43-41-AM Args: n (positive-int): 設定比較前第n筆低。 Returns: (pd.DataFrame): data Examples: 收盤價是否低於10日前股價

from finlab import data
data.get('price:收盤價').fall(10)

groupby_category

groupby_category()

資料按產業分群

類似 pd.DataFrame.groupby()的處理效果。 Returns: (pd.DataFrame): data Examples: 半導體平均股價淨值比時間序列

from finlab import data
pe = data.get('price_earning_ratio:股價淨值比')
pe.groupby_category().mean()['半導體'].plot()

全球 2020 量化寬鬆加上晶片短缺，使得半導體股價淨值比衝高。

hold_until

hold_until(exit, nstocks_limit=None, stop_loss=-np.inf, take_profit=np.inf, trade_at='close', rank=None)

訊號進出場

這大概是所有策略撰寫中，最重要的語法糖，上述語法中 entries 為進場訊號，而 exits 是出場訊號。所以 entries.hold_until(exits) ，就是進場訊號為 True 時，買入並持有該檔股票，直到出場訊號為 True 則賣出。 Screen-Shot-2021-10-26-at-6-35-05-AM 此函式有很多細部設定，可以讓你最多選擇 N 檔股票做輪動。另外，當超過 N 檔進場訊號發生，也可以按照客制化的排序，選擇優先選入的股票。最後，可以設定價格波動當輪動訊號，來增加出場的時機點。

PARAMETER	DESCRIPTION
`exit`	出場訊號。 TYPE: `Dataframe`
nstocks_limit (int)`	輪動檔數上限，預設為None。
`stop_loss`	價格波動輪動訊號，預設為None，不生成輪動訊號。範例：0.1，代表成本價下跌 10% 時產生出場訊號。 TYPE: `float` DEFAULT: `-inf`
`take_profit`	價格波動輪動訊號，預設為None，不生成輪動訊號。範例：0.1，代表成本價上漲 10% 時產生出場訊號。 TYPE: `float` DEFAULT: `inf`
`trade_at`	價格波動輪動訊號參考價，預設為'close'。可選 `close` 或 `open`。 TYPE: `str` DEFAULT: `'close'`
`rank`	當天進場訊號數量超過 nstocks_limit 時，以 rank 數值越大的股票優先進場。 TYPE: `Dataframe` DEFAULT: `None`

RETURNS	DESCRIPTION
`DataFrame`	data

Examples:

價格 > 20 日均線入場, 價格 < 60 日均線出場，最多持有10檔，超過 10 個進場訊號，則以股價淨值比小的股票優先選入。

from finlab import data
from finlab.backtest import sim

close = data.get('price:收盤價')
pb = data.get('price_earning_ratio:股價淨值比')

sma20 = close.average(20)
sma60 = close.average(60)

entries = close > sma20
exits = close < sma60

#pb前10小的標的做輪動
position = entries.hold_until(exits, nstocks_limit=10, rank=-pb)
sim(position)

index_str_to_date

index_str_to_date()

財務月季報索引格式轉換

將以下資料的索引轉換成datetime格式:

財務季報 (ex:2022-Q1) 從文字格式轉為財報電子檔資料上傳日。

通常使用情境為對不同週期的dataframe做reindex，常用於以公告截止日作為訊號產生日。

RETURNS	DESCRIPTION
`DataFrame`	data

Examples:

data.get('financial_statement:現金及約當現金').index_str_to_date()

industry_rank

industry_rank(categories=None)

計算產業 ranking 排名，0 代表產業內最低，1 代表產業內最高 Args: categories (list of str): 欲考慮的產業，ex: ['貿易百貨', '雲端運算']，預設為全產業，請參考 data.get('security_industry_themes') 中的產業項目。 Examples: 本意比產業排名分數

from finlab import data

pe = data.get('price_earning_ratio:本益比')
pe_rank = pe.industry_rank()
print(pe_rank)

is_entry

is_entry()

進場點

取進場訊號點，若符合條件的值則為True，反之為False。 Returns: (pd.DataFrame): data Examples: 策略為每日收盤價前10高，取進場點。

from finlab import data
data.get('price:收盤價').is_largest(10).is_entry()

is_exit

is_exit()

出場點

取出場訊號點，若符合條件的值則為 True，反之為 False。 Returns: (pd.DataFrame): data Examples: 策略為每日收盤價前10高，取出場點。

from finlab import data
data.get('price:收盤價').is_largest(10).is_exit()

is_largest

is_largest(n)

取每列前 n 筆大的數值

若符合 True ，反之為 False 。用來篩選每天數值最大的股票。

is-largest Args: n (positive-int): 設定每列前 n 筆大的數值。 Returns: (pd.DataFrame): data Examples: 每季 ROA 前 10 名的股票

from finlab import data

roa = data.get('fundamental_features:ROA稅後息前')
good_stocks = roa.is_largest(10)

is_smallest

is_smallest(n)

取每列前 n 筆小的數值

若符合 True ，反之為 False 。用來篩選每天數值最小的股票。 Args: n (positive-int): 設定每列前 n 筆小的數值。 Returns: (pd.DataFrame): data Examples: 股價淨值比最小的 10 檔股票

from finlab import data

pb = data.get('price_earning_ratio:股價淨值比')
cheap_stocks = pb.is_smallest(10)

neutralize

neutralize(neutralizers, add_const=True)

Multi-factor cross-sectional neutralization

Performs cross-sectional regression to neutralize factors from the data. The residuals from regressing self on the neutralizers are returned.

PARAMETER	DESCRIPTION
`neutralizers`	Factor(s) to neutralize against. Can be: - A single DataFrame - A list of DataFrames - A dict of DataFrames (keys are factor names) TYPE: `Union[DataFrame, list[DataFrame], dict[str, DataFrame]]`
`add_const`	Whether to add a constant term to the regression. Default True. TYPE: `bool` DEFAULT: `True`

RETURNS	DESCRIPTION
`FinlabDataFrame`	Neutralized data (regression residuals)

Examples:

Neutralize a factor against size:

from finlab import data

factor = data.get('price_earning_ratio:本益比')
size = data.get('fundamental_features:市值')

neutralized = factor.neutralize(size)

Neutralize against multiple factors using a list:

neutralized = factor.neutralize([size, beta])

Neutralize against multiple factors using a dict:

neutralized = factor.neutralize({
    'size': size,
    'size2': size ** 2,
    'beta': beta,
})

neutralize_industry

neutralize_industry(categories=None, add_const=True)

Industry neutralization using dummy variables

Performs cross-sectional regression to neutralize industry effects from the data. Each stock is assigned to an industry, and the factor is regressed on industry dummy variables. The residuals (industry-neutral factor) are returned.

PARAMETER	DESCRIPTION
`categories`	Optional DataFrame with 'stock_id' and 'category' columns. If not provided, uses `data.get('security_categories')`. TYPE: `DataFrame` DEFAULT: `None`
`add_const`	Whether to add a constant term to the regression. Default True. Note: When using industry dummies, adding a constant creates multicollinearity, so one industry dummy is automatically dropped when add_const=True. TYPE: `bool` DEFAULT: `True`

RETURNS	DESCRIPTION
`FinlabDataFrame`	Industry-neutralized data (regression residuals)

Examples:

Neutralize a factor against industry:

from finlab import data

factor = data.get('price_earning_ratio:本益比')
neutralized = factor.neutralize_industry()

Using custom categories:

custom_cats = pd.DataFrame({
    'stock_id': ['2330', '2317', '1101'],
    'category': ['半導體', '電子', '水泥']
})
neutralized = factor.neutralize_industry(categories=custom_cats)

quantile_row

quantile_row(c)

股票當天數值分位數

取得每列c定分位數的值。 Args: c (positive-int): 設定每列 n 定分位數的值。 Returns: (pd.DataFrame): data Examples: 取每日股價前90％分位數

from finlab import data
data.get('price:收盤價').quantile_row(0.9)

rank

rank(*args, valid=None, **kwargs)

Cross-sectional or time-series ranking.

PARAMETER	DESCRIPTION
`valid`	只有 valid 為 True 的 cell 才參與排名。 valid 為 False/NaN 的 cell 在排名前設為 NaN，因此不影響 pct=True 的分母。常見用法：fillna 後傳入原始 notna() mask，避免新上市或歷史不足的股票污染百分位排名。 TYPE: `DataFrame or Series of bool` DEFAULT: `None`

rise

rise(n=1)

數值上升中

取是否比前第n筆高，若符合條件的值則為True，反之為False。 Screen-Shot-2021-10-26-at-6-43-41-AM Args: n (positive-int): 設定比較前第n筆高。 Returns: (pd.DataFrame): data Examples: 收盤價是否高於10日前股價

from finlab import data
data.get('price:收盤價').rise(10)

sustain

sustain(nwindow, nsatisfy=None)

持續 N 天滿足條件

取移動 nwindow 筆加總大於等於nsatisfy，若符合條件的值則為True，反之為False。

PARAMETER	DESCRIPTION
`nwindow`	設定移動窗格。 TYPE: `positive - int`
`nsatisfy`	設定移動窗格計算後最低滿足數值。 TYPE: `positive - int` DEFAULT: `None`

Returns: (pd.DataFrame): data Examples: 收盤價是否連兩日上漲

from finlab import data
data.get('price:收盤價').rise().sustain(2)

Key FinlabDataFrame Methods:

Stock Selection and Ranking

Method	Description	Example
`is_largest(n)`	Select the N largest values	`marketcap.is_largest(30)`
`is_smallest(n)`	Select the N smallest values	`pe_ratio.is_smallest(20)`
`rank()`	Compute ranking (supports industry ranking)	`close.rank(pct=True)`
`industry_rank()`	Ranking within industry	`close.industry_rank()`

Technical Indicators

Method	Description	Example
`average(n)`	N-period moving average	`close.average(20)`
`rise(n)`	Consecutive N-period rise	`close.rise(3)`
`fall(n)`	Consecutive N-period fall	`close.fall(2)`
`sustain(n)`	Condition sustained for N periods	`(close > ma20).sustain(5)`

Position Management

Method	Description	Example
`hold_until()`	Hold until stop-loss/take-profit	`position.hold_until(stop_loss=0.1)`
`exit_when()`	Exit on condition	`position.exit_when(close < ma20)`
`entry_price()`	Get entry price	`position.entry_price('close')`

Industry Analysis

Method	Description	Example
`neutralize_industry()`	Industry neutralization	`factor.neutralize_industry()`
`groupby_category()`	Group by industry	`close.groupby_category().mean()`

average()

Compute moving average.

Usage Examples:

from finlab import data

close = data.get('price:收盤價')

# 20-day moving average
ma20 = close.average(20)

# 60-day moving average
ma60 = close.average(60)

# Moving average crossover strategy
golden_cross = ma20 > ma60

Difference from pandas rolling()

average(20) is equivalent to rolling(20).mean()
More concise and readable
Return value is still a FinlabDataFrame, allowing method chaining

is_largest() / is_smallest()

Select top/bottom N by ranking.

Usage Examples:

from finlab import data

close = data.get('price:收盤價')
marketcap = data.get('etl:market_value')

# Select the 30 smallest by market cap (small-cap stocks)
small_cap = marketcap.is_smallest(30)

# Select the 20 largest by momentum (momentum stocks)
momentum = (close / close.shift(20)).is_largest(20)

# Combine: momentum stocks among small-caps
position = small_cap & momentum

Notes

is_largest(n) re-ranks every period (daily/monthly)
If fewer than N stocks are available, all are selected
Returns a Boolean DataFrame (True = selected)

rank()

Compute ranking (supports percentile ranking).

Usage Examples:

from finlab import data

close = data.get('price:收盤價')
marketcap = data.get('etl:market_value')

# Compute percentile ranking (0-1)
cap_rank = marketcap.rank(pct=True, axis=1)

# Select top 30% by market cap
position = cap_rank > 0.7

# Or use is_largest (more intuitive)
position = marketcap.is_largest(int(len(marketcap.columns) * 0.3))

industry_rank()

Ranking within industry.

Usage Examples:

from finlab import data

close = data.get('price:收盤價')

# Compute within-industry ranking (0-1)
industry_rank = close.industry_rank()

# Select top 30% performers within each industry
position = industry_rank > 0.7

Industry Classification Source

Uses TEJ industry classification
Automatically handles industry reclassification

rise() / fall()

Detect consecutive rises/falls.

Usage Examples:

from finlab import data

close = data.get('price:收盤價')

# 3 consecutive days of rising
rising = close.rise(3)

# 2 consecutive days of falling
falling = close.fall(2)

# Strategy: reverse signal after consecutive rises
position = ~rising  # Inverse signal

sustain()

Check if a condition is sustained for N periods.

Usage Examples:

from finlab import data

close = data.get('price:收盤價')
ma20 = close.average(20)

# Price stays above 20-day MA for 5 consecutive days
sustained_above_ma = (close > ma20).sustain(5)

# Strategy
position = sustained_above_ma

sustain vs rise

rise(n): value increases for n consecutive periods
sustain(n): condition is True for n consecutive periods

hold_until()

Hold position until stop-loss/take-profit triggers.

Usage Examples:

from finlab import data
from finlab.backtest import sim

close = data.get('price:收盤價')
position = close > close.average(20)

# Basic usage: 10% stop-loss
position_with_sl = position.hold_until(stop_loss=0.1)

# 10% stop-loss + 20% take-profit
position_with_stops = position.hold_until(stop_loss=0.1, take_profit=0.2)

# Backtest
report = sim(position_with_stops, resample='M')
report.display()

# Trailing stop requires setting in sim()
report = sim(position, resample='M', trail_stop=0.2)

Key Concepts

hold_until() extends the holding period until the trigger condition
Stop-loss/take-profit are calculated based on the entry price
Trailing stop must be set via sim(trail_stop=...), exits when price drops a specified percentage from the peak

exit_when()

Exit when a specific condition is met.

Usage Examples:

from finlab import data

close = data.get('price:收盤價')
ma20 = close.average(20)
ma60 = close.average(60)

# Entry: short MA crosses above long MA
entry = ma20 > ma60

# Exit: short MA crosses below long MA
exit_signal = ma20 < ma60

# Hold until exit signal
position = entry.exit_when(exit_signal)

exit_when vs hold_until

exit_when(): Custom exit conditions (e.g., technical indicators)
hold_until(): Fixed stop-loss/take-profit percentages

neutralize_industry()

Industry neutralization - removes industry factor effects.

Usage Examples:

from finlab import data

# Raw factor (with industry influence)
factor = data.get('fundamental_features:股東權益報酬率')

# Industry neutralization (removes industry average)
neutral_factor = factor.neutralize_industry()

# Use neutralized factor
position = neutral_factor.is_largest(30)

Why Neutralize?

Some industries naturally have higher ROE (e.g., financial sector)
Neutralization avoids over-concentration in a single industry
Improves strategy stability

Complete Example: Multi-Filter Strategy

from finlab import data
from finlab.backtest import sim

# Step 1: Load data
close = data.get('price:收盤價')
marketcap = data.get('etl:market_value')
revenue = data.get('monthly_revenue:當月營收')
roe = data.get('fundamental_features:股東權益報酬率')

# Step 2: Define filter conditions
cond1 = marketcap.is_smallest(100)  # Small-cap
cond2 = (revenue.average(3) / revenue.average(12)) > 1.2  # Revenue growth
cond3 = roe.rank(pct=True, axis=1) > 0.7  # Top 30% ROE
cond4 = (close > close.average(20)).sustain(3)  # Above 20MA for 3 days

# Step 3: Combine conditions
position = cond1 & cond2 & cond3 & cond4

# Step 4: Add stop-loss/take-profit
position = position.hold_until(stop_loss=0.1, take_profit=0.2)

# Step 5: Limit number of holdings
position = position.is_largest(20)  # Max 20 stocks

# Step 6: Backtest
report = sim(position, resample='M')
report.display()

# Step 7: Analyze performance
metrics = report.get_metrics()
print(f"Annual return: {metrics['annual_return']:.2%}")
print(f"Sharpe ratio: {metrics['daily_sharpe']:.2f}")
print(f"Max drawdown: {metrics['max_drawdown']:.2%}")

FAQ

Q: How does FinlabDataFrame automatically align different frequency data?

from finlab import data

# Daily data
close = data.get('price:收盤價')  # Updated daily
print(close.index[0])  # 2010-01-04

# Monthly data
revenue = data.get('monthly_revenue:當月營收')  # Published around the 10th of each month
print(revenue.index[0])  # 2010-01-10

# Direct operations auto-align (revenue is forward-filled to daily frequency)
position = (close > 100) & (revenue > 1e8)
# position is daily data, revenue has been auto-filled

Alignment Rules: - Low frequency to high frequency: Forward fill - Quarterly reports (financial statements): Automatically delayed to the financial statement announcement date

Q: How do I chain multiple FinlabDataFrame methods?

from finlab import data

close = data.get('price:收盤價')

# Method chaining (similar to pandas method chaining)
position = (
    close
    .average(20)           # Compute 20MA
    .rank(pct=True, axis=1)  # Rank
    .is_largest(30)        # Select top 30
)

# Equivalent to
ma20 = close.average(20)
ranked = ma20.rank(pct=True, axis=1)
position = ranked.is_largest(30)

Q: What is the difference between is_largest() and rank() + threshold?

from finlab import data

close = data.get('price:收盤價')

# Method 1: is_largest (fixed count)
position1 = close.is_largest(30)  # Always selects 30 stocks

# Method 2: rank + threshold (fixed percentage)
position2 = close.rank(pct=True, axis=1) > 0.8  # Selects top 20% (count varies)

# Difference:
# - Method 1: Fixed number of holdings (suitable for fixed capital allocation)
# - Method 2: Variable number of holdings (adapts to market conditions)

Q: How does trailing stop (trail_stop) work?

Trailing stop must be set via the trail_stop parameter in sim() -- hold_until() does not support this feature.

from finlab import data
from finlab.backtest import sim

close = data.get('price:收盤價')
position = close > close.average(20)

# Example: Entry price is 100
# - stop_loss=0.1: Exit if price drops below 90
# - take_profit=0.2: Exit when price reaches 120 (fixed take-profit)
# - trail_stop=0.2: Exit when price drops 20% from peak (trailing stop)

# Scenario: Entry at 100 -> rises to 150 -> drops to 120
# - take_profit=0.2: Exits at 120, does not wait for 150
# - trail_stop=0.2: Exits when price drops 20% from peak 150 = 120

# Fixed take-profit
report_fixed = sim(position, resample='M', stop_loss=0.1, take_profit=0.2)

# Trailing stop
report_trailing = sim(position, resample='M', stop_loss=0.1, trail_stop=0.2)

Q: How do I use pandas methods on FinlabDataFrame?

from finlab import data

close = data.get('price:收盤價')

# FinlabDataFrame is fully compatible with pandas
close.rolling(20).mean()  # pandas method
close.fillna(0)           # pandas method
close.dropna()            # pandas method

# All pandas methods work
# Results remain FinlabDataFrame (retaining extended features)

Resources

Stock Selection Tutorial - FinlabDataFrame practical applications
Backtesting Tutorial - Complete backtesting workflow
Technical Indicator Selection Examples - Strategy examples
pandas Official Documentation - pandas fundamentals