Skip to content

finlab.data

Core data download module providing historical data for Taiwan and US stock markets.

Use Cases

  • Download historical data such as stock prices, financial statements, and institutional trading
  • Filter data by market or industry sector
  • Search for available data tables and fields
  • Configure data caching strategies
  • Limit data download range to save memory

Quick Examples

Basic Usage: Download Data

from finlab import data

# Download closing prices
close = data.get('price:收盤價')

# Download P/E ratio
pe_ratio = data.get('price_earning_ratio:本益比')

# Download monthly revenue
revenue = data.get('monthly_revenue:當月營收')

Search Available Fields

# Search for fields containing "收盤" (close)
data.search('收盤')
# Output: ['price:收盤價', 'etl:不含除權息收盤價', ...]

# Search US stock data
data.search('close', market='us')

Restrict Market Scope

# Only fetch listed (TSE) company data
with data.universe(market='TSE'):
    close = data.get('price:收盤價')

# Only fetch specific industry sectors
with data.universe(category=['水泥工業', '食品工業']):
    close = data.get('price:收盤價')

Detailed Guide

See Data Download Details for: - Complete data download tutorial - Data table structure explanation - Advanced filtering techniques - Error handling methods


Global Configuration

Force Cloud/Local Data

from finlab import data

# Force cloud download (re-download every time)
data.force_cloud_download = True

# Force local cache only (offline environment)
data.use_local_data_only = True

Limit Data Time Range

# Only download data from 2020-2023 (saves memory)
data.truncate_start = '2020-01-01'
data.truncate_end = '2023-12-31'

# All subsequent data.get() calls will use this range
close = data.get('price:收盤價')

Recommended Usage

  • Development phase: Use truncate_start to limit data range for faster testing
  • Production backtesting: Remove truncate limits, use full historical data
  • Low memory: Set truncate_start or use use_local_data_only

API Reference

data.get()

finlab.data.get

get(dataset, save_to_storage=True, force_download=False)

下載歷史資料

請至歷史資料目錄 來獲得所有歷史資料的名稱,即可使用此函式來獲取歷史資料。 假設 save_to_storageTrue 則,程式會自動在本地複製一份,以避免重複下載大量數據。

PARAMETER DESCRIPTION
dataset

The name of dataset.

TYPE: str

save_to_storage

Whether to save the dataset to storage for later use. Default is True. The argument will be removed in the future. Please use data.set_storage(FileStorage(use_cache=True)) instead.

TYPE: bool DEFAULT: True

force_download

Whether to force download the dataset from cloud. Default is False.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
DataFrame

financial data

Examples:

欲下載所有上市上櫃之收盤價歷史資料,只需要使用此函式即可:

from finlab import data
close = data.get('price:收盤價')
close
date 0015 0050 0051 0052 0053
2007-04-23 9.54 57.85 32.83 38.4 nan
2007-04-24 9.54 58.1 32.99 38.65 nan
2007-04-25 9.52 57.6 32.8 38.59 nan
2007-04-26 9.59 57.7 32.8 38.6 nan
2007-04-27 9.55 57.5 32.72 38.4 nan

Note

使用 data.get 時,會預設優先下載近期資料,並與本地資料合併,以避免重複下載大量數據。

假如想要強制下載所有資料,可以在下載資料前,使用

data.force_cloud_download = True
假如想要強制使用本地資料,不額外下載,可以在下載資料前,使用
data.use_local_data_only = True

Common Data Tables

Price Data: - price:收盤價 - Daily closing price - price:開盤價 - Daily opening price - price:最高價 / price:最低價 - Daily high/low - price:成交股數 - Trading volume

Fundamental Data: - price_earning_ratio:本益比 - P/E ratio - price_earning_ratio:股價淨值比 - P/B ratio - fundamental_features:股東權益報酬率 - ROE - financial_statement:每股盈餘 - EPS

Institutional Trading Data: - institutional_investors_trading_summary:投信買賣超股數 - margin_transactions:融資使用率 - etl:外資持股比例

Monthly Revenue: - monthly_revenue:當月營收 - monthly_revenue:去年同月增減(%)

See the full list at the Database Catalog.

Common Errors

  • KeyError: Data table name is wrong or API token is not set
  • Empty DataFrame: Query conditions too strict or data does not exist
  • Out of memory: Too much data downloaded, use truncate_start to limit range

data.search()

finlab.data.search

search(keyword=None, market=None)

搜尋 FinLab 資料庫可用的資料欄位。

PARAMETER DESCRIPTION
keyword

搜尋關鍵字。若為 None 則列出全部。

TYPE: str DEFAULT: None

market

市場選擇 ('tw', 'us', 'hk', 'jp', 'kr', 'uk', 'all')。 預設依據 data.set_market() 設定,若未設定則為 'tw'。

TYPE: str DEFAULT: None

RETURNS DESCRIPTION
list

可用於 data.get() 的資料名稱列表,格式為 "table:column"

TYPE: list

Examples:

# 列出全部台股資料
tw_data = data.search()

# 搜尋台股包含 '收盤' 的欄位
close_data = data.search('收盤', market='tw')
# ['price:收盤價']

# 搜尋美股包含 'close' 的欄位
us_close = data.search('close', market='us')
# ['us_price:close', 'us_div_adj_price:adj_close', ...]

# 搜尋日股包含 'price' 的欄位
jp_price = data.search('price', market='jp')

# 搜尋所有市場包含 'price' 的欄位
all_price = data.search('price', market='all')

Examples:

# Search for fields containing "股東" (shareholder)
data.search('股東')
# ['fundamental_features:股東權益報酬率', 'internal_equity_pledge:百分之十以上大股東持有股數', ...]

# Search for US stock PE ratio
data.search('pe', market='us')

# List all Taiwan stock fields
all_fields = data.search()
print(f"Total {len(all_fields)} fields")

data.universe()

finlab.data.universe

universe

universe(exchange='ALL', sector='ALL', exclude_sector=None, industry='ALL', asset_type=None, *, market=None, category=None, exclude_category=None)

Context manager to set a global stock universe filter for data retrieval.

Auto-dispatches TW vs international logic based on data._current_market.

Parameters

exchange : str | list[str], default 'ALL' TW: 'TWSE'/'TPEx' (or legacy 'TSE'/'OTC'). International: 'NASDAQ'/'NYSE'/'AMEX'/'HKEX'/'TSE'/etc. sector : str | list[str], default 'ALL' Sector name(s) with regex matching. exclude_sector : str | list[str] | None, default None Sector(s) to exclude (TW only). industry : str | list[str], default 'ALL' Industry filter (international markets). asset_type : str | None, default None TW only: 'ETF' or 'STOCK_FUTURE'. market : str | None Legacy alias for TW exchange+asset_type. category : str | list[str] | None Legacy alias for sector. exclude_category : str | list[str] | None Legacy alias for exclude_sector.

Examples

TW market (default):

from finlab import data with data.universe(exchange=['TWSE', 'TPEx'], sector=['鋼鐵工業', '航運業']): ... close = data.get('price:收盤價')

Legacy TW usage (still works):

with data.universe(market='TSE_OTC', sector='水泥', exclude_sector='ETF'): ... close = data.get('price:收盤價')

US market:

data.set_market('us') with data.universe(exchange='NASDAQ', sector='Technology'): ... close = data.get('price:close')

JP market:

data.set_market('jp') with data.universe(sector='Technology'): ... close = data.get('price:close')

us_universe

us_universe(sector='ALL', industry='ALL', exchange='ALL')

Context manager to set a global stock universe filter for US market data retrieval.

This context manager limits the set of US stocks returned by data functions to a specific sector, industry, and exchange selection. The filter is applied globally within the context and is restored after the context exits.

Parameters

sector : str | list[str], default 'ALL' Sector name(s) to include. Supports regex-like substring matching. industry : str | list[str], default 'ALL' Industry name(s) to include. Supports regex-like substring matching. exchange : str | list[str], default 'ALL' Exchange name(s) to include. Common values: 'NASDAQ', 'NYSE', 'AMEX'.

Examples

from finlab import data with data.us_universe(sector='Technology', exchange='NASDAQ'): ... close = data.get('us_price:close')

set_universe

set_universe(exchange='ALL', sector='ALL', exclude_sector=None, industry='ALL', asset_type=None, *, market=None, category=None, exclude_category=None)

Set global stock universe filter. Auto-dispatches based on current market.

When data.set_market('tw') (or no market set), uses TW logic (security_categories). For any other market (us, hk, jp, kr, uk), loads {market}_company_profile and filters by available columns.

Parameters

exchange : str | list[str], default 'ALL' TW: 'TWSE', 'TPEx' (or legacy 'TSE', 'OTC'). International: 'NASDAQ', 'NYSE', 'AMEX', 'HKEX', 'TSE', etc. sector : str | list[str], default 'ALL' Sector name(s) with regex matching. exclude_sector : str | list[str] | None, default None Sector(s) to exclude (TW only). industry : str | list[str], default 'ALL' Industry filter (international markets). asset_type : str | None, default None TW only: 'ETF' or 'STOCK_FUTURE'. market : str | None Legacy alias for TW exchange+asset_type (e.g. 'TSE_OTC', 'ETF'). category : str | list[str] | None Legacy alias for sector. exclude_category : str | list[str] | None Legacy alias for exclude_sector.

set_us_universe

set_us_universe(sector='ALL', industry='ALL', exchange='ALL')

Set global US stock universe filter.

Thin wrapper around _set_intl_universe_impl for backward compatibility.

Parameters

sector : str | list[str], default 'ALL' Sector filter with regex-like substring matching. industry : str | list[str], default 'ALL' Industry filter with regex-like substring matching. exchange : str | list[str], default 'ALL' Exchange filter (e.g., 'NASDAQ', 'NYSE', 'AMEX').

Examples:

# Example 1: Only listed companies
with data.universe(market='TSE'):
    close = data.get('price:收盤價')
    print(f"Number of listed companies: {len(close.columns)}")

# Example 2: Specific industry sectors
with data.universe(category=['半導體業']):
    close = data.get('price:收盤價')

# Example 3: Top 100 by market cap
with data.universe(size=100):
    close = data.get('price:收盤價')

# Example 4: Combined conditions
with data.universe(market='TSE_OTC', category=['電子工業'], size=50):
    close = data.get('price:收盤價')

Available market parameter values: - 'TSE' - Listed (Taiwan Stock Exchange) - 'OTC' - OTC (Taipei Exchange) - 'TSE_OTC' - Listed + OTC - 'ALL' - All (including Emerging Stock Board)

data.us_universe()

finlab.data.us_universe

us_universe(sector='ALL', industry='ALL', exchange='ALL')

Context manager to set a global stock universe filter for US market data retrieval.

This context manager limits the set of US stocks returned by data functions to a specific sector, industry, and exchange selection. The filter is applied globally within the context and is restored after the context exits.

Parameters

sector : str | list[str], default 'ALL' Sector name(s) to include. Supports regex-like substring matching. industry : str | list[str], default 'ALL' Industry name(s) to include. Supports regex-like substring matching. exchange : str | list[str], default 'ALL' Exchange name(s) to include. Common values: 'NASDAQ', 'NYSE', 'AMEX'.

Examples

from finlab import data with data.us_universe(sector='Technology', exchange='NASDAQ'): ... close = data.get('us_price:close')

US Market Filtering:

# Get S&P 500 constituents
with data.us_universe(index='SPX'):
    close = data.get('price:close')

# Get NASDAQ 100
with data.us_universe(index='NDX'):
    close = data.get('price:close')

data.indicator()

finlab.data.indicator

indicator(indname, adjust_price=False, resample='D', **kwargs)

支援 Talib 和 pandas_ta 上百種技術指標,計算 2000 檔股票、10年的所有資訊。

在使用這個函式前,需要安裝計算技術指標的 Packages

PARAMETER DESCRIPTION
indname

指標名稱, 以 TA-Lib 舉例,例如 SMA, STOCH, RSI 等,可以參考 talib 文件

以 Pandas-ta 舉例,例如 supertrend, ssf 等,可以參考 Pandas-ta 文件

TYPE: str

adjust_price

是否使用還原股價計算。

TYPE: bool DEFAULT: False

resample

技術指標價格週期,ex: D 代表日線, W 代表週線, M 代表月線。

TYPE: str DEFAULT: 'D'

market

市場選擇,ex: TW_STOCK 代表台股, US_STOCK 代表美股。

TYPE: str

**kwargs

技術指標的參數設定,TA-Lib 中的 RSI 為例,調整項為計算週期 timeperiod=14

TYPE: dict DEFAULT: {}

建議使用者可以先參考以下範例,並且搭配 talib官方文件,就可以掌握製作技術指標的方法了。

Technical Indicator Examples:

from finlab import data

# Get MACD indicator
macd = data.indicator('macd', data.get('price:收盤價'))

# Get RSI indicator
rsi = data.indicator('rsi', data.get('price:收盤價'), period=14)

Cache Management

finlab.data.set_storage

set_storage(storage)

設定本地端儲存歷史資料的方式 假設使用 data.get 獲取歷史資料則,在預設情況下,程式會自動在本地複製一份,以避免重複下載大量數據。 storage 就是用來儲存歷史資料的接口。我們提供兩種 storage 接口,分別是 finlab.data.CacheStorage (預設) 以及 finlab.data.FileStorage。前者是直接存在記憶體中,後者是存在檔案中。詳情請參考 CacheStorageFileStorage 來獲得更詳細的資訊。 在預設情況下,程式會自動使用 finlab.data.FileStorage 並將重複索取之歷史資料存在作業系統預設「暫時資料夾」。

PARAMETER DESCRIPTION
storage

The interface of storage

TYPE: Storage

Examples:

欲切換成以檔案方式儲存,可以用以下之方式:

from finlab import data
data.set_storage(data.FileStorage())
close = data.get('price:收盤價')

可以在本地端的 ./finlab_db/price#收盤價.pickle 中,看到下載的資料, 可以使用 pickle 調閱歷史資料:

import pickle
close = pickle.load(open('finlab_db/price#收盤價.pickle', 'rb'))

finlab.data.CacheStorage

CacheStorage()

將歷史資料儲存於快取中

Examples:

欲切換成以檔案方式儲存,可以用以下之方式:

from finlab import data
data.set_storage(data.CacheStorage())
close = data.get('price:收盤價')

可以直接調閱快取資料:

close = data._storage._cache['price:收盤價']

finlab.data.FileStorage

FileStorage(path=None, use_cache=True)

將歷史資料儲存於檔案中

PARAMETER DESCRIPTION
path

資料儲存的路徑

TYPE: str DEFAULT: None

use_cache

是否額外使用快取,將資料複製一份到記憶體中。

TYPE: bool DEFAULT: True

Examples:

欲切換成以檔案方式儲存,可以用以下之方式:

from finlab import data
data.set_storage(data.FileStorage())
close = data.get('price:收盤價')

可以在本地端的 ./finlab_db/price#收盤價.pickle 中,看到下載的資料, 可以使用 pickle 調閱歷史資料:

import pickle
close = pickle.load(open('finlab_db/price#收盤價.pickle', 'rb'))

diagnose

diagnose(dataset=None)

診斷本地儲存狀態

PARAMETER DESCRIPTION
dataset

指定要檢查的資料集名稱,例如 'price:收盤價'。如果不指定,則列出所有本地資料。

TYPE: str DEFAULT: None

Examples:

from finlab import data
data._storage.diagnose()  # 列出所有本地資料
data._storage.diagnose('price:收盤價')  # 檢查特定資料集

Custom Cache Strategy:

from finlab.data import set_storage, FileStorage

# Use a custom directory
storage = FileStorage('/path/to/custom/cache')
set_storage(storage)

# All subsequent data will be cached to the specified location
close = data.get('price:收盤價')

Other Utilities

finlab.data.get_strategies

get_strategies(api_token=None)

取得已上傳量化平台的策略回傳資料。

可取得自己策略儀表板上的數據,例如每個策略的報酬率曲線、報酬率統計、夏普率、近期部位、近期換股日..., 這些數據可以用來進行多策略彙整的應用喔!

PARAMETER DESCRIPTION
api_token

若未帶入finlab模組的api_token,會自動跳出GUI頁面, 複製網頁內的api_token貼至輸入欄位即可。

TYPE: str DEFAULT: None

Returns: (dict): strategies data Response detail:

``` py
{
  strategy1:{
    'asset_type': '',
    'drawdown_details': {
       '2015-06-04': {
         'End': '2015-11-03',
         'Length': 152,
         'drawdown': -0.19879090089478024
         },
         ...
      },
    'fee_ratio': 0.000475,
    'last_trading_date': '2022-06-10',
    'last_updated': 'Sun, 03 Jul 2022 12:02:27 GMT',
    'ndays_return': {
      '1': -0.01132480035770611,
      '10': -0.0014737286933147464,
      '20': -0.06658015749110646,
      '5': -0.002292995729485159,
      '60': -0.010108700314771735
      },
    'next_trading_date': '2022-06-10',
    'positions': {
      '1413 宏洲': {
        'entry_date': '2022-05-10',
        'entry_price': 10.05,
        'exit_date': '',
        'next_weight': 0.1,
        'return': -0.010945273631840613,
        'status': '買進',
        'weight': 0.1479332345384493
        },
      'last_updated': 'Sun, 03 Jul 2022 12:02:27 GMT',
      'next_trading_date': '2022-06-10',
      'trade_at': 'open',
      'update_date': '2022-06-10'
      },
    'return_table': {
      '2014': {
        'Apr': 0.0,
        'Aug': 0.06315180932606546,
        'Dec': 0.0537589857541485,
        'Feb': 0.0,
        'Jan': 0.0,
        'Jul': 0.02937490104459939,
        'Jun': 0.01367930162104769,
        'Mar': 0.0,
        'May': 0.0,
        'Nov': -0.0014734320286596825,
        'Oct': -0.045082529665408266,
        'Sep': 0.04630906972509852,
        'YTD': 0.16626214846456966
        },
        ...
      },
    'returns': {
      'time': [
        '2014-06-10',
        '2014-06-11',
        '2014-06-12',
        ...
        ],
      'value': [
        100,
        99.9,
        100.2,
        ...
        ]
      },
    'stats': {
      'avg_down_month': -0.03304015302646822,
      'avg_drawdown': -0.0238021414698247,
      'avg_drawdown_days': 19.77952755905512,
      'avg_up_month': 0.05293384465715908,
      'cagr': 0.33236021285588846,
      'calmar': 1.65261094975066,
      'daily_kurt': 4.008888367138843,
      'daily_mean': 0.3090784769257415,
      'daily_sharpe': 1.747909002374217,
      'daily_skew': -0.6966018726321078,
      'daily_sortino': 2.8300677082214034,
      ...
      },
    'tax_ratio': 0.003,
    'trade_at': 'open',
    'update_date': '2022-06-10'
    },
  strategy2:{...},
  ...}
```

FAQ

Q: How do I find out what data is available for download?

Method 1: Use search()

# List all fields
all_fields = data.search()
for field in all_fields[:10]:
    print(field)

Method 2: Check the online database Visit the FinLab Database Catalog to browse the full list.

Q: Data download is slow, what can I do?

# Method 1: Limit time range
data.truncate_start = '2020-01-01'

# Method 2: Use cache (second call will be fast)
close = data.get('price:收盤價')  # First call is slow
close = data.get('price:收盤價')  # Second call is fast (uses cache)

# Method 3: Use universe to limit stock count
with data.universe(size=100):
    close = data.get('price:收盤價')  # Only downloads 100 stocks

Q: KeyError: 'price:收盤價' - what should I do?

Possible causes: 1. Not logged in - Run finlab.login() or finlab.login('YOUR_TOKEN') 2. Incorrect field name - Use data.search('收盤') to verify the correct name 3. Invalid API token - Re-obtain the token

import finlab

# Check if logged in
try:
    token, token_type = finlab.get_token()
    print(f"Logged in ({token_type})")
except:
    print("Not logged in, please run finlab.login()")

Q: How do I download US stock data?

from finlab import data

# US stock closing prices
us_close = data.get('price:close', market='us')

# Search US stock fields
us_fields = data.search(market='us')

Q: What about missing values in the data?

close = data.get('price:收盤價')

# Check missing values
print(f"Missing value ratio: {close.isna().sum().sum() / close.size:.2%}")

# Fill missing values
close_filled = close.fillna(method='ffill')  # Forward fill

# Or drop stocks with too many missing values
close_clean = close.dropna(axis=1, thresh=len(close)*0.8)  # Keep stocks with 80%+ data

Q: How can I save memory?

# Method 1: Limit time range
data.truncate_start = '2020-01-01'

# Method 2: Process in batches
all_stocks = close.columns
for batch in [all_stocks[i:i+100] for i in range(0, len(all_stocks), 100)]:
    batch_close = close[batch]
    # Process 100 stocks at a time...

# Method 3: Only download the fields you need
# Avoid calling data.get() for too many tables at once

Resources