Create your own local OHLCV datasets using Python and the Binance API

Summary

The website provides a comprehensive guide on creating and updating local OHLCV datasets for cryptocurrency market data using Python and the Binance API.

Abstract

The article outlines a method for efficiently managing historical cryptocurrency market data by leveraging Python scripts to interact with the Binance API. It details the creation of a Python file, exchange.py, to handle API requests and data processing, ensuring a complete and continuous dataset. Additionally, it describes the utils.py file for managing local data files, optimized for handling large datasets. The guide also includes setting up a config.py file to specify the cryptocurrency pairs and timeframes of interest, and a main.py file to orchestrate the data retrieval and updating process. The script, once executed, automates the retrieval of new data and updates existing datasets, optimizing for time and computing power. The article emphasizes the benefits of local dataset management, such as time efficiency, reduced computing power usage, and the ability to work offline. It concludes with instructions on running the script and a mention of the author's GitHub repository for the complete program.

Opinions

The author advocates for the use of local datasets over repeated API calls for efficiency and robustness.
The most_recent_market_data() function in exchange.py is highlighted for its ability to overcome size limits and ensure data continuity.
The read_last_line() function in utils.py is praised for its efficiency in handling large datasets, avoiding the need to load entire files into memory.
The author suggests that the script's execution time will vary based on the number of pairs and the granularity of the timeframes selected.
The article promotes the author's GitHub repository, implying that it contains valuable resources for readers interested in the topic.
The author expresses appreciation for support on Medium and GitHub and offers a referral link for Medium membership, indicating a desire for community engagement and support for future work.

Learn how to pull historical data from the Binance API without size limit. Update your datasets by loading only the most recent missing timestamps.

In my workflow, I chose to use local datasets of historical market data instead of pulling data from the Binance exchange each time I need them. This represents a gain in time and computing power alocated to this task and a gain in simplicity and robustness when working offline. All my coding projects are located in a single folder on my local machine. Each project that requires cryptocurrency data will look in the same and unique folder containing datasets of a bunch of selected cryptocurrency market data. I wrote a small Python script whose purpose is to load the entire market data found on Binance for a given pair. Therefore, the oldest candlesticks stored in these CSV files are as old as Binance which was created in 2017. Morevover, whenever I want to update these datasets with the most recent candles, the same single command executing the script in my terminal will scan the latest candle found in a given dataframe and pull data from the Binance API from this candle on.

Step 1: create an exchange.py file that will interact with the Binance API

This file will contain the minimum functions needed to send a request to the Binance API, read the response, and post-process the raw data to obtain a more readable DataFrame. The added value of this script is located in the most_recent_market_data() function. Indeed, the klines() function allows to get a limited number of candles within a single request. The most_recent_market_data() function performs multiple requests to cover the full time range of data available on the exchange and garantee the time continuity of the obtained dataset.

Step 2: Create an utils.py file that will manage the local data files

This file’s purpose is to be a small toolbox, containing various functions to manipulate files, define the naming of these files, or read only one line of code. I want to stress that the read_last_line() function is defined in a way that allows quick reading of long and heavy files. It will work even if you want to work with 5 years of 1mn spaced candlesticks. This is quite powerful compared to the naive solution of loading the CSV file as a Pandas DataFrame.

NB: you can adapt this files if you want to put your data in a different folder than the one I chose, or use a different naming rule for the files.

Step 4: Create a main.py file that will manage everything

Now is the final step: putting it all together. This last file is the one to be executed in your Python terminal. Each time you will execute it, it will:

Loop through all pairs and timeframe

Check if a CSV file already exists for any (pair, timeframe) combination

If so, it will read the last line (most recent candle) and pull the candles coming after it from the Binance API.

Else, it will pull the entire data available on the Binance API and store it in a dedicated CSV file.

Step 5: Run the script

To execute the script, you only need to open a terminal where you put these files, and type :

python main.py

Remember that the more pairs you want, and the smaller the timeframes, the longer the script will run. Several hours can be needed. The advantage is that once a file is created for a given (pair, timeframe) combination, it is a lot quicker to update.

Note that you can also find a GitHub repository of this program on my GitHub page : BINANCE_DATA_READER.