Skip to content

Pandas Strategy Development Basics

1. Setup

import pandas as pd
import numpy as np
from finlab import data
from finlab.market import USMarket

data.set_market('us')

2. Create Data Objects

data.get() returns a DataFrame with dates as the index and stock symbols as columns. Selecting a single column gives you a Series.

# DataFrame: US adjusted close prices
df = data.get('price:adj_close')
df = df[df.index.dayofweek < 5]  # remove weekends
df = df[df.index <= '2024-07-01']

# Series: single stock
series = df['AAPL']

3. Data Inspection

df = data.get('price:adj_close')
df = df[df.index.dayofweek < 5]  # remove weekends

# head(n): get the first n rows, default is 5
demo_head = df.head(3)

# tail(n): get the last n rows, default is 5
demo_tail = df.tail(3)

# index: get the row index (dates)
demo_index = df.index

# columns: get the column labels (stock symbols)
demo_columns = df.columns

# values: get the underlying numpy array
demo_values = df.values

# Boolean indexing: filter rows by condition
# Example: get data from 2023 onwards
demo_boolean_indexing = df[df.index > '2023']

# loc: label-based selection
# Example: select all stocks for a specific date range
demo_loc1 = df.loc['2024-01-02':'2024-01-08']

# Example: select specific stocks (two equivalent methods)
demo_loc2 = df.loc[:, ['AAPL', 'MSFT', 'GOOGL']]
demo_column_select = df[['AAPL', 'MSFT', 'GOOGL']]

# Example: select specific date range AND specific stocks
demo_loc3 = df.loc['2024-01-02':'2024-01-08', ['AAPL', 'MSFT', 'GOOGL']]

# iloc: integer position-based selection
# Example: select rows 6 through 10
demo_iloc1 = df.iloc[5:10]

# Example: select columns 6 through 10
demo_iloc2 = df.iloc[:, 5:10]

# Example: select rows 6-10 AND columns 6-10
demo_iloc3 = df.iloc[5:10, 5:10]

# nlargest: get top n values
# Example: get the 10 highest-priced stocks on the most recent day
demo_nlargest = df.iloc[-1].nlargest(10)

# nsmallest: get bottom n values
# Example: get the 10 lowest-priced stocks on the most recent day
demo_nsmallest = df.iloc[-1].nsmallest(10)

4. Display Control

Pandas limits display to 10 rows by default to prevent excessive output. You can adjust this behavior globally -- note that changes affect all subsequent output in your session.

df = data.get('price:adj_close')

# Show all rows
pd.set_option("display.max_rows", None)

# Show all columns
pd.set_option("display.max_columns", None)

# Show at most 20 rows
pd.set_option("display.max_rows", 20)

# Show at most 20 columns
pd.set_option("display.max_columns", 20)

# Reset all display options to defaults
pd.reset_option("^display")

5. Data Operations

df = data.get('price:adj_close')
df = df[df.index.dayofweek < 5]  # remove weekends

# Create a boolean DataFrame: True where price > 100
price_above_100 = df > 100

# sum: aggregate along an axis (axis=0 sums each column, axis=1 sums each row)
# Example: count how many stocks are priced above $100 each day
demo_sum = price_above_100.sum(axis=1)

# mean: average value (commonly used for computing moving averages)
demo_mean = df.mean()

# median: middle value
demo_median = df.median()

# std: standard deviation (commonly used for Bollinger Bands, deviation rate)
demo_std = df.std()

# max: maximum value
demo_max = df.max()

# min: minimum value
demo_min = df.min()

# cumsum: cumulative sum
# Example: cumulative dollar change
demo_diff = df.diff()
demo_cumsum = demo_diff.cumsum()

# cumprod: cumulative product
# Example: compute cumulative return
demo_pct_change = df.pct_change() + 1
demo_cumprod = demo_pct_change.cumprod()

# cummax: running maximum (commonly used for drawdown calculation)
demo_cummax = df.cummax()

# cummin: running minimum
demo_cummin = df.cummin()

# quantile: get the value at the nth percentile
# Example: get the 90th percentile price on the most recent day
demo_quantile = df.iloc[-1].quantile(0.9)

# corr: pairwise correlation
demo_corr = df.iloc[:, :5].corr()

# describe: summary statistics
demo_describe = df.describe()

6. Rolling Window Operations

df = data.get('price:adj_close')
df = df[df.index.dayofweek < 5]  # remove weekends

# shift: move data forward/backward in time
# Example: shift prices down by 1 row (yesterday's price)
demo_shift1 = df.shift()
# Example: shift prices up by 1 row (tomorrow's price -- use with caution to avoid look-ahead bias)
demo_shift2 = df.shift(-1)

# rolling: apply a function over a sliding window
# Example: 20-day moving average (NaN for the first 9 rows, partial calculation starting from row 10)
demo_rolling = df.rolling(20, min_periods=10).mean()

# diff: row-to-row difference
# Example: daily price change in dollars
demo_diff = df.diff()

# pct_change: row-to-row percentage change
# Example: daily return
demo_pct_change = df.pct_change()

7. FinlabDataFrame Features

Beyond standard Pandas methods, data.get() returns a FinlabDataFrame with additional convenience methods designed for strategy development:

Method Description
average(n) Moving average with min_periods=int(n/2)
is_largest(n) Boolean mask: True for the top n values in each row
is_smallest(n) Boolean mask: True for the bottom n values in each row
rise(n=1) Boolean mask: True where the value is higher than n rows ago
fall(n=1) Boolean mask: True where the value is lower than n rows ago
sustain(nwindow, nsatisfy=None) Boolean mask: True where the rolling sum over nwindow rows exceeds nsatisfy
quantile_row(n) Returns the nth quantile value for each row
df = data.get('price:adj_close')
df = df[df.index.dayofweek < 5]  # remove weekends

# 10-day moving average for AAPL and MSFT
df[['AAPL', 'MSFT']].average(10)

# Top 5 stocks by price over the last 10 days
df.iloc[-10:].is_largest(5)

# Bottom 5 stocks by price over the last 10 days
df.iloc[-10:].is_smallest(5)

# Did the price rise compared to yesterday?
df.iloc[-10:, 10:15].rise()

# Did the price fall compared to yesterday?
df.iloc[-10:, 10:15].fall()

# Did the price rise for 2 consecutive days?
df.iloc[-10:, 10:15].rise().sustain(2)

# Get the 90th percentile price across all stocks for the last 10 days
df.iloc[-10:].quantile_row(0.9)

8. Visualization

8-1. Bollinger Band Chart

Plot a stock's price with its Bollinger Bands (20-day moving average plus/minus 2 standard deviations).

df = data.get('price:adj_close')
df = df[df.index.dayofweek < 5]  # remove weekends
aapl = df.iloc[-400:]['AAPL']

mean_20 = aapl.rolling(20).mean()
std_value = aapl.rolling(20).std()

up = mean_20 + std_value * 2
down = mean_20 - std_value * 2
up.plot(label='upper band', legend=True)
down.plot(label='lower band', legend=True)
mean_20.plot(label='MA 20', legend=True)
aapl.plot(title='AAPL Bollinger Bands', label='close', legend=True, figsize=(20, 8), grid=True)

8-2. Correlation Heatmap

Visualize pairwise correlations between multiple stocks using a styled heatmap.

df = data.get('price:adj_close')
df = df[df.index.dayofweek < 5]  # remove weekends
check_list = ['AAPL', 'MSFT', 'GOOGL', 'AMZN', 'NVDA', 'META']
demo_corr = df.iloc[-600:][check_list].corr()

demo_corr.style.background_gradient(cmap='viridis') \
    .set_properties(**{'font-size': '20px'})

9. Strategy Examples

9-1. Moving Average Alignment (Bullish)

Select stocks whose price is simultaneously above the 5-day, 10-day, 20-day, and 60-day moving averages. This multi-timeframe alignment indicates strong upward momentum. Take the top 10 by price.

from finlab import data
from finlab.backtest import sim
from finlab.market import USMarket

data.set_market('us')

df = data.get('price:adj_close')
df = df[df.index.dayofweek < 5]  # remove weekends
df = df[(df.index > '2019') & (df.index < '2024')]

cond1 = df > df.rolling(5).mean()
cond2 = df > df.rolling(10, min_periods=5).mean()
cond3 = df > df.rolling(20, min_periods=10).mean()
cond4 = df > df.rolling(60, min_periods=40).mean()

position = (df * (cond1 & cond2 & cond3 & cond4))
position = position[position > 0]
position = position.is_largest(10)

report = sim(position, resample='M', market=USMarket(), fee_ratio=0.001, tax_ratio=0)
report.display()

9-2. Bollinger Band Upper Breakout

Select stocks whose closing price just broke above the upper Bollinger Band (20-day, 2 standard deviations). This signals a breakout from a period of consolidation. The price filter restricts the universe to stocks between $10 and $500 to avoid penny stocks and extremely high-priced names.

from finlab import data
from finlab.backtest import sim
from finlab.market import USMarket

data.set_market('us')

df = data.get('price:adj_close')
df = df[df.index.dayofweek < 5]  # remove weekends

mean_20 = df.rolling(20).mean()
std_value = df.rolling(20).std()

up = mean_20 + std_value * 2

# Price just broke above the upper Bollinger Band
cond1 = (df > up) & (df.shift() < up)
# Price between $10 and $500
cond2 = (df > 10) & (df < 500)

position = (cond1 & cond2)
position = position[position > 0]
position = position.is_smallest(20)

report = sim(position, resample='M', market=USMarket(), fee_ratio=0.001, tax_ratio=0)
report.display()