Skip to content

finlab.dataframe

FinlabDataFrame extends Pandas DataFrame with automatic data alignment, technical indicator computation, and stock selection utilities.

Use Cases

  • Automatically align data of different frequencies (daily, weekly, monthly, quarterly)
  • Quickly perform stock selection (ranking, top-N filtering)
  • Compute moving averages and technical indicators
  • Manage entry/exit signals (hold until stop-loss/take-profit)
  • Industry neutralization

Quick Examples

Auto-Alignment and Stock Selection

from finlab import data

# data.get() returns FinlabDataFrame
close = data.get('price:收盤價')  # Daily
revenue = data.get('monthly_revenue:當月營收')  # Monthly

# Automatic date alignment (revenue is forward-filled to daily frequency)
position = (close > close.average(20)) & (revenue > 1e8)

Ranking and Filtering

from finlab import data

close = data.get('price:收盤價')
marketcap = data.get('etl:market_value')

# Select the 30 smallest by market cap
small_cap = marketcap.is_smallest(30)

# Select the 20 largest by price momentum
momentum = (close / close.shift(20)).is_largest(20)

# Combine strategies
position = small_cap & momentum

Position Management

from finlab import data
from finlab.backtest import sim

close = data.get('price:收盤價')
position = close > close.average(20)

# Hold until 10% stop-loss or 20% take-profit
position_with_stops = position.hold_until(stop_loss=0.1, take_profit=0.2)

# Backtest
report = sim(position_with_stops, resample='M')
report.display()

FinlabDataFrame vs pandas DataFrame

Feature pandas DataFrame FinlabDataFrame
Basic operations Fully supported Fully supported (inherits pandas)
Auto-alignment Requires manual handling Automatically aligns different frequency data
Stock selection Write your own Built-in is_largest(), is_smallest()
Technical indicators Requires extra packages Built-in average(), rise(), fall()
Position management Write your own Built-in hold_until(), exit_when()
Industry analysis Write your own Built-in industry_rank(), neutralize_industry()

API Reference

FinlabDataFrame

finlab.dataframe.FinlabDataFrame

FinlabDataFrame(*args, **kwargs)

Bases: DataFrame

回測語法糖 除了使用熟悉的 Pandas 語法外,我們也提供很多語法糖,讓大家開發程式時,可以用簡易的語法完成複雜的功能,讓開發策略更簡潔! 我們將所有的語法糖包裹在 FinlabDataFrame 中,用起來跟 pd.DataFrame 一樣,但是多了很多功能! 只要使用 finlab.data.get() 所獲得的資料,皆為 FinlabDataFrame 格式, 接下來我們就來看看, FinlabDataFrame 有哪些好用的語法糖吧!

當資料日期沒有對齊(例如: 財報 vs 收盤價 vs 月報)時,在使用以下運算符號: +, -, *, /, >, >=, ==, <, <=, &, |, ~, 不需要先將資料對齊,因為 FinlabDataFrame 會自動幫你處理,以下是示意圖。

steps

以下是範例:cond1cond2 分別為「每天」,和「每季」的資料,假如要取交集的時間,可以用以下語法:

from finlab import data
# 取得 FinlabDataFrame
close = data.get('price:收盤價')
roa = data.get('fundamental_features:ROA稅後息前')

# 運算兩個選股條件交集
cond1 = close > 37
cond2 = roa > 0
cond_1_2 = cond1 & cond2
擷取 1101 台泥 的訊號如下圖,可以看到 cond1cond2 訊號的頻率雖然不相同, 但是由於 cond1cond2FinlabDataFrame,所以可以直接取交集,而不用處理資料頻率對齊的問題。
imageconds

總結來說,FinlabDataFrame 與一般 dataframe 唯二不同之處: 1. 多了一些 method,如df.is_largest(), df.sustain()...等。 2. 在做四則運算、不等式運算前,會將 df1、df2 的 index 取聯集,column 取交集。

average

average(n)

取 n 筆移動平均

若股票在時間窗格內,有 N/2 筆 NaN,則會產生 NaN。 Args: n (positive-int): 設定移動窗格數。 Returns: (pd.DataFrame): data Examples: 股價在均線之上

from finlab import data
close = data.get('price:收盤價')
sma = close.average(10)
cond = close > sma
只需要簡單的語法,就可以將其中一部分的訊號繪製出來檢查:
import matplotlib.pyplot as plt

close.loc['2021', '2330'].plot()
sma.loc['2021', '2330'].plot()
cond.loc['2021', '2330'].mul(20).add(500).plot()

plt.legend(['close', 'sma', 'cond'])
sma

deadline

deadline()

財務索引轉換成公告截止日

將財務季報 (ex:2022Q1) 從文字格式轉為公告截止日的datetime格式, 通常使用情境為對不同週期的dataframe做reindex,常用於以公告截止日作為訊號產生日。 Returns: (pd.DataFrame): data Examples:

data.get('financial_statement:現金及約當現金').deadline()
data.get('monthly_revenue:當月營收').deadline()

fall

fall(n=1)

數值下降中

取是否比前第n筆低,若符合條件的值則為True,反之為False。 Screen-Shot-2021-10-26-at-6-43-41-AM Args: n (positive-int): 設定比較前第n筆低。 Returns: (pd.DataFrame): data Examples: 收盤價是否低於10日前股價

from finlab import data
data.get('price:收盤價').fall(10)

groupby_category

groupby_category()

資料按產業分群

類似 pd.DataFrame.groupby()的處理效果。 Returns: (pd.DataFrame): data Examples: 半導體平均股價淨值比時間序列

from finlab import data
pe = data.get('price_earning_ratio:股價淨值比')
pe.groupby_category().mean()['半導體'].plot()
pbmean

全球 2020 量化寬鬆加上晶片短缺,使得半導體股價淨值比衝高。

hold_until

hold_until(exit, nstocks_limit=None, stop_loss=-np.inf, take_profit=np.inf, trade_at='close', rank=None)

訊號進出場

這大概是所有策略撰寫中,最重要的語法糖,上述語法中 entries 為進場訊號,而 exits 是出場訊號。所以 entries.hold_until(exits) ,就是進場訊號為 True 時,買入並持有該檔股票,直到出場訊號為 True 則賣出。 Screen-Shot-2021-10-26-at-6-35-05-AM 此函式有很多細部設定,可以讓你最多選擇 N 檔股票做輪動。另外,當超過 N 檔進場訊號發生,也可以按照客制化的排序,選擇優先選入的股票。最後,可以設定價格波動當輪動訊號,來增加出場的時機點。

PARAMETER DESCRIPTION
exit

出場訊號。

TYPE: Dataframe

nstocks_limit (int)`

輪動檔數上限,預設為None。

stop_loss

價格波動輪動訊號,預設為None,不生成輪動訊號。範例:0.1,代表成本價下跌 10% 時產生出場訊號。

TYPE: float DEFAULT: -inf

take_profit

價格波動輪動訊號,預設為None,不生成輪動訊號。範例:0.1,代表成本價上漲 10% 時產生出場訊號。

TYPE: float DEFAULT: inf

trade_at

價格波動輪動訊號參考價,預設為'close'。可選 closeopen

TYPE: str DEFAULT: 'close'

rank

當天進場訊號數量超過 nstocks_limit 時,以 rank 數值越大的股票優先進場。

TYPE: Dataframe DEFAULT: None

RETURNS DESCRIPTION
DataFrame

data

Examples:

價格 > 20 日均線入場, 價格 < 60 日均線出場,最多持有10檔,超過 10 個進場訊號,則以股價淨值比小的股票優先選入。

from finlab import data
from finlab.backtest import sim

close = data.get('price:收盤價')
pb = data.get('price_earning_ratio:股價淨值比')

sma20 = close.average(20)
sma60 = close.average(60)

entries = close > sma20
exits = close < sma60

#pb前10小的標的做輪動
position = entries.hold_until(exits, nstocks_limit=10, rank=-pb)
sim(position)

index_str_to_date

index_str_to_date()

財務月季報索引格式轉換

將以下資料的索引轉換成datetime格式:

財務季報 (ex:2022-Q1) 從文字格式轉為財報電子檔資料上傳日。

通常使用情境為對不同週期的dataframe做reindex,常用於以公告截止日作為訊號產生日。

RETURNS DESCRIPTION
DataFrame

data

Examples:

data.get('financial_statement:現金及約當現金').index_str_to_date()

industry_rank

industry_rank(categories=None)

計算產業 ranking 排名,0 代表產業內最低,1 代表產業內最高 Args: categories (list of str): 欲考慮的產業,ex: ['貿易百貨', '雲端運算'],預設為全產業,請參考 data.get('security_industry_themes') 中的產業項目。 Examples: 本意比產業排名分數

from finlab import data

pe = data.get('price_earning_ratio:本益比')
pe_rank = pe.industry_rank()
print(pe_rank)

is_entry

is_entry()

進場點

取進場訊號點,若符合條件的值則為True,反之為False。 Returns: (pd.DataFrame): data Examples: 策略為每日收盤價前10高,取進場點。

from finlab import data
data.get('price:收盤價').is_largest(10).is_entry()

is_exit

is_exit()

出場點

取出場訊號點,若符合條件的值則為 True,反之為 False。 Returns: (pd.DataFrame): data Examples: 策略為每日收盤價前10高,取出場點。

from finlab import data
data.get('price:收盤價').is_largest(10).is_exit()

is_largest

is_largest(n)

取每列前 n 筆大的數值

若符合 True ,反之為 False 。用來篩選每天數值最大的股票。

is-largest Args: n (positive-int): 設定每列前 n 筆大的數值。 Returns: (pd.DataFrame): data Examples: 每季 ROA 前 10 名的股票

from finlab import data

roa = data.get('fundamental_features:ROA稅後息前')
good_stocks = roa.is_largest(10)

is_smallest

is_smallest(n)

取每列前 n 筆小的數值

若符合 True ,反之為 False 。用來篩選每天數值最小的股票。 Args: n (positive-int): 設定每列前 n 筆小的數值。 Returns: (pd.DataFrame): data Examples: 股價淨值比最小的 10 檔股票

from finlab import data

pb = data.get('price_earning_ratio:股價淨值比')
cheap_stocks = pb.is_smallest(10)

neutralize

neutralize(neutralizers, add_const=True)

Multi-factor cross-sectional neutralization

Performs cross-sectional regression to neutralize factors from the data. The residuals from regressing self on the neutralizers are returned.

PARAMETER DESCRIPTION
neutralizers

Factor(s) to neutralize against. Can be: - A single DataFrame - A list of DataFrames - A dict of DataFrames (keys are factor names)

TYPE: Union[DataFrame, list[DataFrame], dict[str, DataFrame]]

add_const

Whether to add a constant term to the regression. Default True.

TYPE: bool DEFAULT: True

RETURNS DESCRIPTION
FinlabDataFrame

Neutralized data (regression residuals)

Examples:

Neutralize a factor against size:

from finlab import data

factor = data.get('price_earning_ratio:本益比')
size = data.get('fundamental_features:市值')

neutralized = factor.neutralize(size)

Neutralize against multiple factors using a list:

neutralized = factor.neutralize([size, beta])

Neutralize against multiple factors using a dict:

neutralized = factor.neutralize({
    'size': size,
    'size2': size ** 2,
    'beta': beta,
})

neutralize_industry

neutralize_industry(categories=None, add_const=True)

Industry neutralization using dummy variables

Performs cross-sectional regression to neutralize industry effects from the data. Each stock is assigned to an industry, and the factor is regressed on industry dummy variables. The residuals (industry-neutral factor) are returned.

PARAMETER DESCRIPTION
categories

Optional DataFrame with 'stock_id' and 'category' columns. If not provided, uses data.get('security_categories').

TYPE: DataFrame DEFAULT: None

add_const

Whether to add a constant term to the regression. Default True. Note: When using industry dummies, adding a constant creates multicollinearity, so one industry dummy is automatically dropped when add_const=True.

TYPE: bool DEFAULT: True

RETURNS DESCRIPTION
FinlabDataFrame

Industry-neutralized data (regression residuals)

Examples:

Neutralize a factor against industry:

from finlab import data

factor = data.get('price_earning_ratio:本益比')
neutralized = factor.neutralize_industry()

Using custom categories:

custom_cats = pd.DataFrame({
    'stock_id': ['2330', '2317', '1101'],
    'category': ['半導體', '電子', '水泥']
})
neutralized = factor.neutralize_industry(categories=custom_cats)

quantile_row

quantile_row(c)

股票當天數值分位數

取得每列c定分位數的值。 Args: c (positive-int): 設定每列 n 定分位數的值。 Returns: (pd.DataFrame): data Examples: 取每日股價前90%分位數

from finlab import data
data.get('price:收盤價').quantile_row(0.9)

rank

rank(*args, valid=None, **kwargs)

Cross-sectional or time-series ranking.

PARAMETER DESCRIPTION
valid

只有 valid 為 True 的 cell 才參與排名。 valid 為 False/NaN 的 cell 在排名前設為 NaN, 因此不影響 pct=True 的分母。 常見用法:fillna 後傳入原始 notna() mask, 避免新上市或歷史不足的股票污染百分位排名。

TYPE: DataFrame or Series of bool DEFAULT: None

rise

rise(n=1)

數值上升中

取是否比前第n筆高,若符合條件的值則為True,反之為False。 Screen-Shot-2021-10-26-at-6-43-41-AM Args: n (positive-int): 設定比較前第n筆高。 Returns: (pd.DataFrame): data Examples: 收盤價是否高於10日前股價

from finlab import data
data.get('price:收盤價').rise(10)

sustain

sustain(nwindow, nsatisfy=None)

持續 N 天滿足條件

取移動 nwindow 筆加總大於等於nsatisfy,若符合條件的值則為True,反之為False。

PARAMETER DESCRIPTION
nwindow

設定移動窗格。

TYPE: positive - int

nsatisfy

設定移動窗格計算後最低滿足數值。

TYPE: positive - int DEFAULT: None

Returns: (pd.DataFrame): data Examples: 收盤價是否連兩日上漲

from finlab import data
data.get('price:收盤價').rise().sustain(2)

Key FinlabDataFrame Methods:

Stock Selection and Ranking

Method Description Example
is_largest(n) Select the N largest values marketcap.is_largest(30)
is_smallest(n) Select the N smallest values pe_ratio.is_smallest(20)
rank() Compute ranking (supports industry ranking) close.rank(pct=True)
industry_rank() Ranking within industry close.industry_rank()

Technical Indicators

Method Description Example
average(n) N-period moving average close.average(20)
rise(n) Consecutive N-period rise close.rise(3)
fall(n) Consecutive N-period fall close.fall(2)
sustain(n) Condition sustained for N periods (close > ma20).sustain(5)

Position Management

Method Description Example
hold_until() Hold until stop-loss/take-profit position.hold_until(stop_loss=0.1)
exit_when() Exit on condition position.exit_when(close < ma20)
entry_price() Get entry price position.entry_price('close')

Industry Analysis

Method Description Example
neutralize_industry() Industry neutralization factor.neutralize_industry()
groupby_category() Group by industry close.groupby_category().mean()

average()

Compute moving average.

Usage Examples:

from finlab import data

close = data.get('price:收盤價')

# 20-day moving average
ma20 = close.average(20)

# 60-day moving average
ma60 = close.average(60)

# Moving average crossover strategy
golden_cross = ma20 > ma60

Difference from pandas rolling()

  • average(20) is equivalent to rolling(20).mean()
  • More concise and readable
  • Return value is still a FinlabDataFrame, allowing method chaining

is_largest() / is_smallest()

Select top/bottom N by ranking.

Usage Examples:

from finlab import data

close = data.get('price:收盤價')
marketcap = data.get('etl:market_value')

# Select the 30 smallest by market cap (small-cap stocks)
small_cap = marketcap.is_smallest(30)

# Select the 20 largest by momentum (momentum stocks)
momentum = (close / close.shift(20)).is_largest(20)

# Combine: momentum stocks among small-caps
position = small_cap & momentum

Notes

  • is_largest(n) re-ranks every period (daily/monthly)
  • If fewer than N stocks are available, all are selected
  • Returns a Boolean DataFrame (True = selected)

rank()

Compute ranking (supports percentile ranking).

Usage Examples:

from finlab import data

close = data.get('price:收盤價')
marketcap = data.get('etl:market_value')

# Compute percentile ranking (0-1)
cap_rank = marketcap.rank(pct=True, axis=1)

# Select top 30% by market cap
position = cap_rank > 0.7

# Or use is_largest (more intuitive)
position = marketcap.is_largest(int(len(marketcap.columns) * 0.3))

industry_rank()

Ranking within industry.

Usage Examples:

from finlab import data

close = data.get('price:收盤價')

# Compute within-industry ranking (0-1)
industry_rank = close.industry_rank()

# Select top 30% performers within each industry
position = industry_rank > 0.7

Industry Classification Source

  • Uses TEJ industry classification
  • Automatically handles industry reclassification

rise() / fall()

Detect consecutive rises/falls.

Usage Examples:

from finlab import data

close = data.get('price:收盤價')

# 3 consecutive days of rising
rising = close.rise(3)

# 2 consecutive days of falling
falling = close.fall(2)

# Strategy: reverse signal after consecutive rises
position = ~rising  # Inverse signal

sustain()

Check if a condition is sustained for N periods.

Usage Examples:

from finlab import data

close = data.get('price:收盤價')
ma20 = close.average(20)

# Price stays above 20-day MA for 5 consecutive days
sustained_above_ma = (close > ma20).sustain(5)

# Strategy
position = sustained_above_ma

sustain vs rise

  • rise(n): value increases for n consecutive periods
  • sustain(n): condition is True for n consecutive periods

hold_until()

Hold position until stop-loss/take-profit triggers.

Usage Examples:

from finlab import data
from finlab.backtest import sim

close = data.get('price:收盤價')
position = close > close.average(20)

# Basic usage: 10% stop-loss
position_with_sl = position.hold_until(stop_loss=0.1)

# 10% stop-loss + 20% take-profit
position_with_stops = position.hold_until(stop_loss=0.1, take_profit=0.2)

# Backtest
report = sim(position_with_stops, resample='M')
report.display()

# Trailing stop requires setting in sim()
report = sim(position, resample='M', trail_stop=0.2)

Key Concepts

  • hold_until() extends the holding period until the trigger condition
  • Stop-loss/take-profit are calculated based on the entry price
  • Trailing stop must be set via sim(trail_stop=...), exits when price drops a specified percentage from the peak

exit_when()

Exit when a specific condition is met.

Usage Examples:

from finlab import data

close = data.get('price:收盤價')
ma20 = close.average(20)
ma60 = close.average(60)

# Entry: short MA crosses above long MA
entry = ma20 > ma60

# Exit: short MA crosses below long MA
exit_signal = ma20 < ma60

# Hold until exit signal
position = entry.exit_when(exit_signal)

exit_when vs hold_until

  • exit_when(): Custom exit conditions (e.g., technical indicators)
  • hold_until(): Fixed stop-loss/take-profit percentages

neutralize_industry()

Industry neutralization - removes industry factor effects.

Usage Examples:

from finlab import data

# Raw factor (with industry influence)
factor = data.get('fundamental_features:股東權益報酬率')

# Industry neutralization (removes industry average)
neutral_factor = factor.neutralize_industry()

# Use neutralized factor
position = neutral_factor.is_largest(30)

Why Neutralize?

  • Some industries naturally have higher ROE (e.g., financial sector)
  • Neutralization avoids over-concentration in a single industry
  • Improves strategy stability

Complete Example: Multi-Filter Strategy

from finlab import data
from finlab.backtest import sim

# Step 1: Load data
close = data.get('price:收盤價')
marketcap = data.get('etl:market_value')
revenue = data.get('monthly_revenue:當月營收')
roe = data.get('fundamental_features:股東權益報酬率')

# Step 2: Define filter conditions
cond1 = marketcap.is_smallest(100)  # Small-cap
cond2 = (revenue.average(3) / revenue.average(12)) > 1.2  # Revenue growth
cond3 = roe.rank(pct=True, axis=1) > 0.7  # Top 30% ROE
cond4 = (close > close.average(20)).sustain(3)  # Above 20MA for 3 days

# Step 3: Combine conditions
position = cond1 & cond2 & cond3 & cond4

# Step 4: Add stop-loss/take-profit
position = position.hold_until(stop_loss=0.1, take_profit=0.2)

# Step 5: Limit number of holdings
position = position.is_largest(20)  # Max 20 stocks

# Step 6: Backtest
report = sim(position, resample='M')
report.display()

# Step 7: Analyze performance
metrics = report.get_metrics()
print(f"Annual return: {metrics['annual_return']:.2%}")
print(f"Sharpe ratio: {metrics['daily_sharpe']:.2f}")
print(f"Max drawdown: {metrics['max_drawdown']:.2%}")

FAQ

Q: How does FinlabDataFrame automatically align different frequency data?

from finlab import data

# Daily data
close = data.get('price:收盤價')  # Updated daily
print(close.index[0])  # 2010-01-04

# Monthly data
revenue = data.get('monthly_revenue:當月營收')  # Published around the 10th of each month
print(revenue.index[0])  # 2010-01-10

# Direct operations auto-align (revenue is forward-filled to daily frequency)
position = (close > 100) & (revenue > 1e8)
# position is daily data, revenue has been auto-filled

Alignment Rules: - Low frequency to high frequency: Forward fill - Quarterly reports (financial statements): Automatically delayed to the financial statement announcement date

Q: How do I chain multiple FinlabDataFrame methods?

from finlab import data

close = data.get('price:收盤價')

# Method chaining (similar to pandas method chaining)
position = (
    close
    .average(20)           # Compute 20MA
    .rank(pct=True, axis=1)  # Rank
    .is_largest(30)        # Select top 30
)

# Equivalent to
ma20 = close.average(20)
ranked = ma20.rank(pct=True, axis=1)
position = ranked.is_largest(30)

Q: What is the difference between is_largest() and rank() + threshold?

from finlab import data

close = data.get('price:收盤價')

# Method 1: is_largest (fixed count)
position1 = close.is_largest(30)  # Always selects 30 stocks

# Method 2: rank + threshold (fixed percentage)
position2 = close.rank(pct=True, axis=1) > 0.8  # Selects top 20% (count varies)

# Difference:
# - Method 1: Fixed number of holdings (suitable for fixed capital allocation)
# - Method 2: Variable number of holdings (adapts to market conditions)

Q: How does trailing stop (trail_stop) work?

Trailing stop must be set via the trail_stop parameter in sim() -- hold_until() does not support this feature.

from finlab import data
from finlab.backtest import sim

close = data.get('price:收盤價')
position = close > close.average(20)

# Example: Entry price is 100
# - stop_loss=0.1: Exit if price drops below 90
# - take_profit=0.2: Exit when price reaches 120 (fixed take-profit)
# - trail_stop=0.2: Exit when price drops 20% from peak (trailing stop)

# Scenario: Entry at 100 -> rises to 150 -> drops to 120
# - take_profit=0.2: Exits at 120, does not wait for 150
# - trail_stop=0.2: Exits when price drops 20% from peak 150 = 120

# Fixed take-profit
report_fixed = sim(position, resample='M', stop_loss=0.1, take_profit=0.2)

# Trailing stop
report_trailing = sim(position, resample='M', stop_loss=0.1, trail_stop=0.2)

Q: How do I use pandas methods on FinlabDataFrame?

from finlab import data

close = data.get('price:收盤價')

# FinlabDataFrame is fully compatible with pandas
close.rolling(20).mean()  # pandas method
close.fillna(0)           # pandas method
close.dropna()            # pandas method

# All pandas methods work
# Results remain FinlabDataFrame (retaining extended features)

Resources