avatarCarsten Klein

Summary

The web content provides a comprehensive guide on obtaining high-resolution historical cryptocurrency trading data (OHLC) at a 1-minute resolution using the Bitfinex exchange API for algorithmic trading purposes.

Abstract

The article titled "How to get historical cryptocurrency data" explains the process of acquiring minute-level Open, High, Low, Close (OHLC) data for cryptocurrency trading pairs, which is essential for backtesting algorithmic trading strategies. It highlights the limitations of existing data sources and presents a Python-based solution using the Bitfinex API, which provides free access to public data endpoints. The author details the steps to install the necessary API client, execute API calls to retrieve historical data, and handle API request limits. Additionally, the article provides a Python function to collect extended periods of OHLC data by segmenting queries to comply with API rate limits and demonstrates how to process and store the data in a Pandas data frame for further analysis. The author also shares a dataset on Kaggle and links to a script for collecting comprehensive historical data for all currency pairs available on Bitfinex.

Opinions

  • The author believes that obtaining high-resolution OHLC data is crucial for developing automated trading strategies in the volatile cryptocurrency markets.
  • The article suggests that while there are several sources for historical cryptocurrency data, most are not ideal due to costs, low temporal resolution, or limited data scope.
  • The author emphasizes the ease of accessing the Bitfinex API, even without an account, due to its public endpoints.
  • The author provides an opinion that using the Bitfinex API client simplifies the process of interacting with the exchange's API, as it eliminates the need for manual API interface creation.
  • The author's approach to collecting data for longer intervals acknowledges the importance of respecting API rate limits to avoid service disruptions.
  • By sharing a public dataset on Kaggle and providing a script for data collection, the author demonstrates a commitment to community engagement and open-source practices.
  • The author encourages further engagement by directing readers to their GitHub repository, Twitter, and LinkedIn profiles.

How to get historical cryptocurrency data

Downloading minute resolution OHLC data via exchange APIs

UPDATE:

Because of the general interest in this matter I created a dataset including all OHLC data from the Bitfinex exchange API and uploaded it as a public dataset on Kaggle.

Introduction

Algorithmic trading is a popular way to tackle the fast-paced and volatile environment of cryptocurrency markets. However implementing an automated trading strategy is challenging and requires a lot of backtesting, which in turn requires a lot of historical data. While there are several sources available that provide historical cryptocurrency data most of them have drawbacks. Either they are expensive, provide only low temporal resolution data (daily) or cover limited time periods of a limited amount of currency pairs. Here we will see that obtaining historical open, high, low, close data (OHLC) at a 1-minute resolution is actually not a magical task and can be done in a few lines of Python code for free.

Also read: Best Crypto APIs for Developers

Connecting to the exchange

In this tutorial, we will use the Bitfinex exchange API to retrieve historical data. However, the approach should also work for any other exchange that provides a similar API. Also, you do not need a Bitfinex account for this code to work since we will only use public API endpoints. In case you are not familiar with what an API is or how to use it I suggest you read through the Bitfinex API documentation, after all this is also the interface through which your algorithm will later interact with the exchange. But do not worry, you won't need to write the Python interface for the Bitfinex API yourself. There are already several implementations available with one of them being this client here. The easiest installation is via pip:

>>> pip install bitfinex-tencars

Alternatively, if you have Git installed you can simply run the commands below to install the client. Just remember to replace <folder> with your target folder.

>>> git clone https://github.com/akcarsten/bitfinex_api.git <folder>
>>> python <folder>/setup.py install

If you do not have Git installed you can clone the repository on the GitHub page, then go to the folder you cloned it to and run:

>>> python setup.py install

In both cases, the Bitfinex client will be installed to your Python distribution.

Using the API client

If you look at the Bitfinex API documentation you will see that there are two API versions, v1 and v2, both of them are implemented in the client you just installed but here we will only use the v2 API. So after importing the Bitfinex API client, we need to create an instance of the v2 API by running the code below. Notice that we are not providing any keys here so we will only have access to the public endpoints, a corresponding message will be shown after running the code.

>>> import bitfinex
 
>>> # Create api instance of the v2 API
>>> api_v2 = bitfinex.bitfinex_v2.api_v2()

And that is our gate to the data. From the documentation, we know that one of the public endpoints is called candles which returns the data behind the candlestick charts that you see on all the exchanges. This kind of data contains the following information a time stamp the open, close, high and low price and the trade volume. It is also referred to as OHLC data. The simplest way to interact with this endpoint through the client is to just call it with its default settings.

>>> result = api_v2.candles()

The line above will give you the last 1000 minutes of OHLC data for the Bitcoin price in USD. Well, that’s nice but we might be interested in a time period long ago or a different currency pair. In this case, we can specify additional parameters to get exactly what we want. And these parameters are:

  • symbol: currency pair,default: BTCUSD
  • interval: temporal resolution, e.g. 1m for 1 minute of OHLC data
  • limit: number of returned data points, default: 1000
  • start: start time of interval in milliseconds since 1970
  • end: end time of interval in milliseconds since 1970

So with this information at hand, we can run the first query. The code below will return the 1-minute resolution OHLC data of Bitcoin price in USD for the first two days in April 2018.

>>> import datetime
>>> import time
>>> # Define query parameters
>>> pair = 'btcusd' # Currency pair of interest
>>> bin_size = '1m' # This will return minute data
>>> limit = 1000    # We want the maximum of 1000 data points 
>>> # Define the start date
>>> t_start = datetime.datetime(2018, 4, 1, 0, 0)
>>> t_start = time.mktime(t_start.timetuple()) * 1000
>>> # Define the end date
>>> t_stop = datetime.datetime(2018, 4, 2, 0, 0)
>>> t_stop = time.mktime(t_stop.timetuple()) * 1000
>>> result = api_v2.candles(symbol=pair, interval=bin_size,  
>>>                         limit=limit, start=t_start, end=t_stop)

Collecting historical data for longer time intervals

Now that’s great but there is still one problem: The API will only return a maximum of 1000 data points. So if we were to increase the time interval of interest to the entire month of April 2018 we would not be able to get it at a 1-minute resolution. So to get past this limitation we need to write a function that splits our big query into multiple smaller ones. One additional thing we need to keep in mind here is that there is a limit of how many requests we can make to the Bitfinex API. Currently, this limit is at 60 calls per minute which means after each request we should wait for a minimum of 1 second before we start the next one. To be safe the function below waits 2 seconds but you can change that if you want.

>>> def fetch_data(start, stop, symbol, interval, tick_limit, step):
>>>     # Create api instance
>>>     api_v2 = bitfinex.bitfinex_v2.api_v2()
>>>     data = []
>>>     start = start - step
>>>     while start < stop:
>>>         start = start + step
>>>         end = start + step
>>>         res = api_v2.candles(symbol=symbol, interval=interval,
>>>                              limit=tick_limit, start=start,
>>>                              end=end)
>>>         data.extend(res)
>>>         time.sleep(2)
>>>     return data

With the function above we can now run queries for longer time intervals, the only extra thing we need to provide is the step size in milliseconds. That is how many data points we should ask for in each of the smaller queries. This is basically the same as the limit we defined earlier but now in milliseconds. So to reduce the number of calls to the API we should go for the maximum which means for the 1-minute case a step size of: 60000 * 1000 = 60000000.

>>> # Set step size
>>> time_step = 60000000
>>> # Define the start date 
>>> t_start = datetime.datetime(2018, 4, 1, 0, 0)
>>> t_start = time.mktime(t_start.timetuple()) * 1000
>>> # Define the end date
>>> t_stop = datetime.datetime(2018, 5, 1, 0, 0)
>>> t_stop = time.mktime(t_stop.timetuple()) * 1000
>>> pair_data = fetch_data(start=t_start, stop=t_stop, symbol=pair,
>>>                        interval=bin_size, tick_limit=limit, 
>>>                        step=time_step)

Finally let’s convert the results into a Pandas data frame so we can remove potential duplicates, make sure everything is in the correct order and convert the timestamp into a readable format.

>>> import pandas as pd
>>>
>>> # Create pandas data frame and clean/format data
>>> names = ['time', 'open', 'close', 'high', 'low', 'volume']
>>> df = pd.DataFrame(pair_data, columns=names)
>>> df.drop_duplicates(inplace=True)
>>> df['time'] = pd.to_datetime(df['time'], unit='ms')
>>> df.set_index('time', inplace=True)
>>> df.sort_index(inplace=True)

Conclusion

So retrieving high-resolution OHLC data is actually not that complicated. And if you wonder for how many currency pairs we can do that through the Bitfinex API, just run the two lines of code below.

>>> api_v1 = bitfinex.bitfinex_v1.api_v1()
>>> pairs = api_v1.symbols()

Now if we were to push it we could write a script like this which collects all the data for each currency pair and saves it to a CSV file. That gives you all the historical OHLC trading data from the Bitfinex exchange at a 1-minute resolution which should help you develop an automated trading strategy. However, it will take a while until all the data is on your computer so you should limit your query either to a shorter time frame or be more selective with your currency pairs.

I hope that helps and you can check out the code here, follow me on Twitter or connect via LinkedIn.

Join Coinmonks Telegram Channel and Youtube Channel get daily Crypto News

Also, Read

Python
Cryptocurrency
Bitcoin
Data Science
Trading
Recommended from ReadMedium