Backtesting Stock Trading Strategies Using Python (Data Preparation)
Are you interested in investing in the stock market? Do you want to learn how to backtest your own trading strategies? In this article, we will introduce a simple Python code that can help you gather and prepare the stock market data you need to backtest your trading strategies.

This code will download and process historical stock market data for the OIH ETF (Oil Services) and save it into different timeframes. We will explain the code step by step, so you can follow along and use it as a template to gather data for other stocks.
Step 1: Downloading the data
The first step is to download the data. We will use the urllib library to download the data. Here’s the code:
import pandas as pd
try:
from urllib.request import urlretrieve
except ImportError:
from urllib import urlretrieve
print(f'Downloading OIH_adjusted.txt...')
urlretrieve('http://api.kibot.com/?action=history&symbol=OIH&interval=1&unadjusted=0&bp=1&user=guest', 'OIH_adjusted.txt')In this code, we import the necessary libraries and use the urlretrieve function to download the data. The data will be saved in a file called OIH_adjusted.txt.
Step 2: Reading and processing the data
Now that we have downloaded the data, we need to read and process it. We will use the pandas library to read the data from the file and assign names to the columns. Here’s the code:
df = pd.read_csv('OIH_adjusted.txt')
df.columns = ['date','time','open','high','low','close','volume']In this code, we use the pd.read_csv function to read the data from the file into a pandas DataFrame. We then assign names to the columns using the columns attribute of the DataFrame.
Step 3: Combining date and time and converting to datetime
The data we downloaded has the date and time in separate columns. We need to combine them into a single column and convert it to datetime format. Here’s the code:
df['date'] = df['date'] + ' ' + df['time']
df['date'] = pd.to_datetime(df['date'], format='%m/%d/%Y %H:%M')
df = df[['date','open','high','low','close','volume']]In this code, we use string concatenation to combine the date and time columns into a single column. We then use the pd.to_datetime function to convert the column to datetime format. Finally, we reorder the columns of the DataFrame.
Step 4: Sorting by date and setting the date as index
Now that the date column is in datetime format, we can sort the DataFrame by date and set the date column as the index. Here’s the code:
df = df.sort_values('date').reset_index(drop=True).set_index('date')Step 5: Convert the data to different timeframes & save them for future uses
Now that we have our data cleaned and sorted by date, we can start resampling it into different timeframes. In this case, we are going to resample the data into timeframes of 1 minute, 5 minutes, 15 minutes, 1 hour, and 1 day. This can be useful for analyzing the data at different levels of granularity.
We’ll use the resample method to resample the data into the different timeframes. The resample method groups the data by a given frequency and applies a function to each group. In this case, we'll apply the agg method to each group, which will aggregate the data in each group according to the AGGREGATION dictionary.
AGGREGATION = {'open': 'first', 'high': 'max', 'low': 'min', 'close': 'last', 'volume': 'sum'}
TIMEFRAMES = ['1T', '5T', '15T', '1H', '1D']
for timeframe in TIMEFRAMES:
print(f'Converting & Saving {timeframe} Data...')
df = df.resample(timeframe).agg(AGGREGATION).dropna()
df.to_csv(f'OIH_{timeframe}.csv.gz', compression='gzip')In the code above, we define a list of timeframes we want to resample the data into and iterate over each of them using a for loop. Inside the loop, we use the resample method to group the data into the given timeframe and apply the aggregation functions defined in the AGGREGATION dictionary. We then drop any rows that contain missing values and save the resulting DataFrame to a compressed CSV file using the to_csv method.
The resulting files will be named OIH_1T.csv.gz, OIH_5T.csv.gz, OIH_15T.csv.gz, OIH_1H.csv.gz and OIH_1D.csv.gz, and they will contain the resampled data for each timeframe. These files can be loaded into Python or other tools for further analysis.
That’s it! With these few lines of code, we were able to download, clean, and resample financial data into different timeframes, which can be useful for backtesting trading strategies, analyzing market trends, and more.
In the next article, we will create a simple backtest environment to test different strategies using Python and the resampled data we just created. Stay tuned!
If you enjoy my work, please support me on Medium by becoming a member through my referral link, and consider giving it a clap as a small gesture of motivation. Thank you!
Download the full source code of this article from here
Twitter / X: https://twitter.com/diegodegese LinkedIn: https://www.linkedin.com/in/ddegese Github: https://github.com/crapher
A Message from InsiderFinance

Thanks for being a part of our community! Before you go:
- 👏 Clap for the story and follow the author 👉
- 📰 View more content in the InsiderFinance Wire
- 📚 Take our FREE Masterclass
- 📈 Discover Powerful Trading Tools




