This context discusses the Triple-Barrier Method for data labelling in financial machine learning, as introduced in Marcos Prado's "Advances in Financial Machine Learning."
Abstract
The Triple-Barrier Method is a technique for labelling data in financial machine learning, as introduced in Marcos Prado's "Advances in Financial Machine Learning." This method labels an observation according to the first barrier touched out of three barriers introduced in Chapter 3 of the book. The method is designed to address the drawbacks of the conventional fixed-time horizon method, which does not exhibit good statistical properties and does not reflect the current state of the investment. The Triple-Barrier Method is path-dependent and allows for more sound decisions based on the number of days the stock is held and what happens to the stock during that period. The original code for the method was created for high-frequency trading using high-frequency data, but the code has been tweaked to work with daily data. The method involves setting three barriers: a starting date, a stop-loss exit price, and a profit-taking exit price. The method also involves setting a vertical barrier, which is the starting date plus the number of days the stock is planned to be held. The method is designed to reflect the risks involved in a bet and to always look for opportunities where there will be a 3:1 earnings ratio.
Bullet points
The Triple-Barrier Method is a technique for labelling data in financial machine learning.
The method labels an observation according to the first barrier touched out of three barriers introduced in Chapter 3 of Marcos Prado's "Advances in Financial Machine Learning."
The method is designed to address the drawbacks of the conventional fixed-time horizon method.
The method is path-dependent and allows for more sound decisions based on the number of days the stock is held and what happens to the stock during that period.
The original code for the method was created for high-frequency trading using high-frequency data, but the code has been tweaked to work with daily data.
The method involves setting three barriers: a starting date, a stop-loss exit price, and a profit-taking exit price.
The method also involves setting a vertical barrier, which is the starting date plus the number of days the stock is planned to be held.
The method is designed to reflect the risks involved in a bet and to always look for opportunities where there will be a 3:1 earnings ratio.
Note from Towards Data Science’s editors: While we allow independent authors to publish articles in accordance with our rules and guidelines, we do not endorse each author’s contribution. You should not rely on an author’s works without seeking professional advice. See our Reader Terms for details.
Warning: There is no magical formula or Holy Grail here, though a new world might open the door for you.
Note 1: How to install mlfinlab package without error messages can be found here.
The triple-barrier method labels an observation according to the first barrier touched out of three barriers introduced in Chapter 3 of Advances in Financial Machine Learning by Marcos Prado¹. The conventional way to label the data is by using the next day (lagged) return with the fixed-time horizon method. This method can be described as follows.
and
There are several drawbacks about this popular conventional labelling method. First, time bars do not exhibit good statistical properties. Second, the same threshold 𝜏 is applied regardless of the observed volatility. Basically, labelling doesn’t reflect the current state of the investment.
Moreover, in a real case, the chance is that you may not want to sell the next day. Therefore, triple-barrier method makes more sense in practice as it is path-dependent. You can make sound decisions depending on how many days you are planning to hold the stock and what’s happening to the stock during that period.
The original code from Chapter 3 of Advances in Financial Machine Learning is created for high-frequency trading, using high-frequency data, and most are intraday data. If you are using daily data, we need to tweak the code a little bit. I also refracted most of the code from the book to make it beginner-friendly by heavily utilizing padasDataFrame structure to store all the information in one place. By this way, it makes life so much easier later on when you start to analysis or plot the data. At the meantime, I employed more complicated approaches such as Average True Range as the daily volatility. You can see all the code at the end of this article.
Intuition
The intuition is like finding outliers as described in my previous articles. The outliers just like the breakthrough in stock trading, which define all the barriers and forming a window for you to make a buy or sell decision. If you haven’t read it, you can always go back to here, here and here.
We will buy in a stock (let’s say Apple) and hold it for 10 days. If the price is going down and trigger the stop loss alarm we exit at the stop-loss limit, or if the price is going up, we take the profit at a certain point. In an extreme case, the stock price goes sideway, we exit at a certain day after holding it for a while.
Assume we have a simple equity management rule:
Never risk more than 2% of your total capital in a trade.
Always look to trade only those opportunities where you will have a 3:1 earnings ratio.
Based on those simple rules, we make a trading plan before we put real money into any stocks. To infuse that trading plan into stock price movement, we need 3 barriers. What are those 3 barriers? 4 lines form a frame, defines a window as showing below.
The x-axis is the datetime, y-axis is the stock price. Line a,d belong to x-axis, which is the datatime index, and line b,c belong to y-axis which is the stock price.
a: starting date
b: stop-loss exit price
c: the profit-taking exit price
d: starting date + the number of days you are planning to hold it.
b and c don’t have to be same. Remember we want to set profit-taking and stop-loss limits that are a function of the risks involved in a bet. And we are always looking to trade only those opportunities where you will have a 3:1 earnings ratio. Here to set c = 3 * b will do the trick.
There are few videos on this topic, I just found one on YouTube.
OK, without further ado, let’s dive in the code.
1. Data preparation
For consistency, in all the 📈Python for finance series, I will try to reuse the same data as much as I can. More details about data preparation can be found here, here and here or you can refer back to my previous article. Or if you like, you can ignore all the code below and use whatever clean data you have at hand, it won’t affect the things we are going to do together.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn')
If you run this function, you will get an error message: SyntaxError: invalid character in identifier, that is because close.index[df0–1]. It can be fixed like this:
def getDailyVol(close,span0=100):
# daily vol, reindexed toclose
df0=close.index.searchsorted(close.index-pd.Timedelta(days=1))
df0=df0[df0>0]
a = df0 -1 #using a variable to avoid the error message.
df0=pd.Series(close.index[a],
index=close.index[close.shape[0]-df0.shape[0]:])
df0=close.loc[df0.index]/close.loc[df0.values].values-1
# daily returns
df0=df0.ewm(span=span0).std()
return df0
If you use daily data instead of intraday data, you will end up with lots of duplicates as the date moved backwards 1 day and causing many NaN later on as many dates will be non-business days.
The first step in calculating ATR is to find a series of true range values for a stock price. The price range of an asset for a given trading day is simply its high minus its low, while the true range is current high less the current low; the absolute value of the current high less the previous close; and the absolute value of the current low less the previous close. The average true range is then a moving average, generally using 14 days, of the true ranges.
def get_atr(stock, win=14):
atr_df = pd.Series(index=stock.index)
high = pd.Series(Apple_stock.high.rolling( \
win, min_periods=win))
low = pd.Series(Apple_stock.low.rolling( \
win, min_periods=win))
close = pd.Series(Apple_stock.close.rolling( \
win, min_periods=win))
for i inrange(len(stock.index)):
tr=np.max([(high[i] - low[i]), \
np.abs(high[i] - close[i]), \
np.abs(low[i] - close[i])], \
axis=0)
atr_df[i] = tr.sum() / win
return atr_df
get_atr(Apple_stock, 14)
atr_df
3. Triple-Barrier
Before we start to work on the barriers, a few parameters need to be decided.
#set the boundary of barriers, based on 20 days EWMdaily_volatility = get_Daily_Volatility(price)
# how many days we hold the stock which set the vertical barriert_final = 10#the up and low boundary multipliersupper_lower_multipliers = [2, 2]
#allign the indexprices = price[daily_volatility.index]
Here, I will use pd.DataFrame as the container to add all the information into one place.
def get_3_barriers():
#create a container
barriers = pd.DataFrame(columns=['days_passed',
'price', 'vert_barrier', \
'top_barrier', 'bottom_barrier'], \
index = daily_volatility.index)
for day, vol in daily_volatility.iteritems():
days_passed = len(daily_volatility.loc \
[daily_volatility.index[0] : day])
#set the vertical barrier
if (days_passed + t_final < len(daily_volatility.index) \
and t_final != 0):
vert_barrier = daily_volatility.index[
days_passed + t_final]
else:
vert_barrier = np.nan
#set the top barrier
if upper_lower_multipliers[0] > 0:
top_barrier = prices.loc[day] + prices.loc[day] * \
upper_lower_multipliers[0] * vol
else:
#set it to NaNs
top_barrier = pd.Series(index=prices.index)
#set the bottom barrier
if upper_lower_multipliers[1] > 0:
bottom_barrier = prices.loc[day] - prices.loc[day] * \
upper_lower_multipliers[1] * vol
else:
#set it to NaNs
bottom_barrier = pd.Series(index=prices.index)
and have a close look at all the data information.
barriers.info()
Only the vert_barrier has 11 NaN value at the end as the t_final was set as 10 days.
The next step is to label each entry according to which barrier was touched first. I add a new column ‘out’ to the end of barriers.
barriers['out'] = None
barriers.head()
Now, we can work on the labels.
def get_labels():
'''start: first day of the window
end:last day of the window
price_initial: first day stock price
price_final:last day stock price
top_barrier: profit taking limit
bottom_barrier:stop loss limt
condition_pt:top_barrier touching conditon
condition_sl:bottom_barrier touching conditon
'''
for i inrange(len(barriers.index)):
start = barriers.index[i]
end = barriers.vert_barrier[i]
if pd.notna(end):
# assign the initial and final price
price_initial = barriers.price[start]
price_final = barriers.price[end]
# assign the top and bottom barrierstop_barrier = barriers.top_barrier[i]
bottom_barrier = barriers.bottom_barrier[i]
#set the profit taking and stop loss conditons
condition_pt = (barriers.price[start: end] >= \
top_barrier).any()
condition_sl = (barriers.price[start: end] <= \
bottom_barrier).any()
Profit-taking boundary: 2 times of 20 days return EWM std
Stop-loss boundary: 2 times of 20 days return EWM std
The rule we expect in the real case:
Always look to trade only those opportunities where you will have a 3:1 earn ratio.
Never risk more than 2% of your total capital in a trade.
The first rule can be easily realized by setting upper_lower_multipliers = [3, 1]. The second one is about the trading size, the side times the size will enable us the calculate the risk (margin/edge). That will be meta-labelling in the next article. So, stay tuned!
Here is all the code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn')
def get_Daily_Volatility(close,span0=20):
# simple percentage returns
df0=close.pct_change()# 20 days, a month EWM's std as boundary
df0=df0.ewm(span=span0).std()
df0.dropna(inplace=True)
return df0
df0= get_Daily_Volatility(price)
def get_atr(stock, win=14):
atr_df = pd.Series(index=stock.index)
high = pd.Series(Apple_stock.high.rolling( \
win, min_periods=win))
low = pd.Series(Apple_stock.low.rolling( \
win, min_periods=win))
close = pd.Series(Apple_stock.close.rolling( \
win, min_periods=win))
for i inrange(len(stock.index)):
tr=np.max([(high[i] - low[i]), \
np.abs(high[i] - close[i]), \
np.abs(low[i] - close[i])], \
axis=0)
atr_df[i] = tr.sum() / win
return atr_df
#set the boundary of barriers, based on 20 days EWMdaily_volatility = get_Daily_Volatility(price)
# how many days we hold the stock which set the vertical barriert_final = 10#the up and low boundary multipliersupper_lower_multipliers = [2, 2]
#allign the indexprices = price[daily_volatility.index]
def get_3_barriers():
#create a container
barriers = pd.DataFrame(columns=['days_passed',
'price', 'vert_barrier', \
'top_barrier', 'bottom_barrier'], \
index = daily_volatility.index)
for day, vol in daily_volatility.iteritems():
days_passed = len(daily_volatility.loc \
[daily_volatility.index[0] : day])
#set the vertical barrier
if (days_passed + t_final < len(daily_volatility.index) \
and t_final != 0):
vert_barrier = daily_volatility.index[
days_passed + t_final]
else:
vert_barrier = np.nan
#set the top barrier
if upper_lower_multipliers[0] > 0:
top_barrier = prices.loc[day] + prices.loc[day] * \
upper_lower_multipliers[0] * vol
else:
#set it to NaNs
top_barrier = pd.Series(index=prices.index)
#set the bottom barrier
if upper_lower_multipliers[1] > 0:
bottom_barrier = prices.loc[day] - prices.loc[day] * \
upper_lower_multipliers[1] * vol
else:
#set it to NaNs
bottom_barrier = pd.Series(index=prices.index)
def get_labels():
'''start: first day of the window
end:last day of the window
price_initial: first day stock price
price_final:last day stock price
top_barrier: profit taking limit
bottom_barrier:stop loss limt
condition_pt:top_barrier touching conditon
condition_sl:bottom_barrier touching conditon
'''
for i inrange(len(barriers.index)):
start = barriers.index[i]
end = barriers.vert_barrier[i]
if pd.notna(end):
# assign the initial and final price
price_initial = barriers.price[start]
price_final = barriers.price[end]
# assign the top and bottom barrierstop_barrier = barriers.top_barrier[i]
bottom_barrier = barriers.bottom_barrier[i]
#set the profit taking and stop loss conditons
condition_pt = (barriers.price[start: end] >= \
top_barrier).any()
condition_sl = (barriers.price[start: end] <= \
bottom_barrier).any()