Web Scraping Cryptocurrency 1-Minute Price Data (Python)
Step-by-step guide to scrap different cryptocurrency prices from CoinDesk.

Goal
This article aims at showing how to use Python scraping the cryptocurrency 1-minute prices in CoinDesk. In general, you can only pull the hour-based or day-based data and this is alright if you are working on a long-term investment strategy. However, the timely and short interval data would be a big benefit if someone is working on day-to-day trading so that he/she can better analyze the upcoming trend and recognize the pattern. Therefore, in the following, I will show you how to pull out the 1-minute price data from a famous cryptocurrency information platform — CoinDesk.

Cryptocurrency List
To scrape the cryptocurrency price, a list of cryptocurrencies is needed. In fact, there are nearly 6000 cryptocurrencies as of August 2021 and not all of them are good for trading in terms of market cap and liquidity. “The CoinDesk 20” would be a good starting point. It filters from the larger universe of thousands of cryptocurrencies and digital assets to define a core group of 20. In the following paragraph, I will scrape the prices of “The CoinDesk 20” as the demonstration and the upcoming tutorial will be based on this set of assets as well. (Please note that ‘MATIC’ is not included since CoinDesk only provides ‘MATIC’ price data starting from July 2021 and it is insufficient to do the analysis.) Also, if you are interested to know what else cryptocurrencies CoinDesk supports, you may refer back to its website.


# The CoinDesk 20
coindesk20_list = ['BTC', 'ETH', 'XRP', 'ADA', 'USDT', 'DOGE', 'XLM', 'DOT', 'UNI', 'LINK', 'USDC', 'BCH', 'LTC', 'GRT', 'ETC', 'FIL', 'AAVE', 'ALGO', 'EOS']
CoinDesk API
Before we start scraping the price, we first have to understand the CoinDesk API, the easiest way is to observe the price chart plot showing.

From the charts, we can trivially observe that the data is in 1-minute based for ‘12h’ chart while the data is in 1-hour based for ‘1w’ chart. In fact, I have summarized in the below table.

Since I am interested in 1-minute price data, I would like to use the ‘12h’ as the basis. Take Bitcoin as illustration, below is the API structure for CoinDesk.
https://production.api.coindesk.com/v2/price/values/BTC?start_date=2021-08-20T15:42&end_date=2021-08-21T03:42&ohlc=true
I have made the parameters to be bold and italic for your easy reference. Basically speaking, 4 parameters can be set.
- Cryptocurrency Symbol
- Price Data Starting Time
- Price Data End Time
- Open-High-Low-Close
Please be remarked that (1) the time is in UTC+0 format, (2) the discrepancies for start time and end time should not be larger than 12 hours if you are interested in minute-based data and (3) if ohlc is set to be false, only closing price will be returned.
Data Scraping
After having the cryptocurrencies list and truly understanding the API structure, we can now start scraping the price.
# Import Libraries
import requests
import numpy as np
import pandas as pd
from datetime import datetime
from dateutil.relativedelta import relativedelta
# The CoinDesk 20
coindesk20_list = ['BTC', 'ETH', 'XRP', 'ADA', 'USDT', 'DOGE', 'XLM', 'DOT', 'UNI', 'LINK', 'USDC', 'BCH', 'LTC', 'GRT', 'ETC', 'FIL', 'AAVE', 'ALGO', 'EOS']
raw_df = pd.DataFrame()
for coin in coindesk20_list:
coin_df = pd.DataFrame()
df = pd.DataFrame(index=[0])
# Define the Start Date and End Date
end_datetime = datetime(2021, 8, 1, 0, 0)
datetime_checkpt = datetime(2021, 7, 1, 0, 0)
while len(df) > 0:
if end_datetime == datetime_checkpt:
break
start_datetime = end_datetime - relativedelta(hours = 12)
url = 'https://production.api.coindesk.com/v2/price/values/' + coin + '?start_date=' + start_datetime.strftime("%Y-%m-%dT%H:%M") + '&end_date=' + end_datetime.strftime("%Y-%m-%dT%H:%M") + '&ohlc=true'
temp_data_json = requests.get(url)
temp_data = temp_data_json.json()
df = pd.DataFrame(temp_data['data']['entries'])
df.columns = ['Timestamp', 'Open', 'High', 'Low', 'Close']
# Handle the Missing Data
insert_idx_list = [np.nan]
while len(insert_idx_list) > 0:
timestamp_checking_array = np.array(df['Timestamp'][1:]) - np.array(df['Timestamp'][:-1])
insert_idx_list = np.where(timestamp_checking_array != 60000)[0]
if len(insert_idx_list) > 0:
print('There are ' + str(len(insert_idx_list)) + ' timestamp mismatched.')
insert_idx = insert_idx_list[0]
temp_df = df.iloc[insert_idx.repeat(int(timestamp_checking_array[insert_idx]/60000)-1)].reset_index(drop=True)
temp_df['Timestamp'] = [temp_df['Timestamp'][0] + i*60000 for i in range(1, len(temp_df)+1)]
df = df.loc[:insert_idx].append(temp_df).append(df.loc[insert_idx+1:]).reset_index(drop=True)
insert_idx_list = insert_idx_list[1:]
df = df.drop(['Timestamp'], axis=1)
df['Datetime'] = [end_datetime - relativedelta(minutes=len(df)-i) for i in range(0, len(df))]
coin_df = df.append(coin_df)
end_datetime = start_datetime
coin_df['Symbol'] = coin
raw_df = raw_df.append(coin_df)
raw_df = raw_df[['Datetime', 'Symbol', 'Open', 'High', 'Low', 'Close']].reset_index(drop=True)
raw_df.to_csv('raw_df.csv', index=False)
Simply speaking, we can divide the codes into 4 parts.
1. Get the JSON data from API
temp_data_json = requests.get(url)
temp_data = temp_data_json.json()
df = pd.DataFrame(temp_data['data']['entries'])
df.columns = ['Timestamp', 'Open', 'High', 'Low', 'Close']
Using the requests package allows us easily pull the API JSON data, and after that, we just store it in pandas data frame and change the column names.
2. Handle the missing data
insert_idx_list = [np.nan]
while len(insert_idx_list) > 0:
timestamp_checking_array = np.array(df['Timestamp'][1:]) - np.array(df['Timestamp'][:-1])
insert_idx_list = np.where(timestamp_checking_array != 60000)[0]
if len(insert_idx_list) > 0:
print('There are ' + str(len(insert_idx_list)) + ' timestamp mismatched.')
insert_idx = insert_idx_list[0]
temp_df = df.iloc[insert_idx.repeat(int(timestamp_checking_array[insert_idx]/60000)-1)].reset_index(drop=True)
temp_df['Timestamp'] = [temp_df['Timestamp'][0] + i*60000 for i in range(1, len(temp_df)+1)]
df = df.loc[:insert_idx].append(temp_df).append(df.loc[insert_idx+1:]).reset_index(drop=True)
insert_idx_list = insert_idx_list[1:]
This part will be the most tricky one. It is because that I found there are some circumstances that CoinDesk does not capture every minute of data. By observation, in the normal situation, the timestamp would have a discrepancy value of 60000 for 1 minute. Therefore, once I notice the row difference for the timestamp is larger than 60000, I can directly tell that gap of time is the missing period. To deal with it, a hot-deck imputation methodology is applied. In the other words, the closest minute data will be used to replace the missing one.
3. Add the Datetime and Symbol to the coin_df
df = df.drop(['Timestamp'], axis=1)
df['Datetime'] = [end_datetime - relativedelta(minutes=len(df)-i) for i in range(0, len(df))]
Since the Timestamp column is defined by CoinDesk and not easily interpreted, instead of writing a time transformation function, I just deduce the Datetime column so as to indicate the date and time for the cryptocurrency price.
coin_df['Symbol'] = coin
Also, the cryptocurrency symbol is added to the coin_df as well.
4. Merge the coin _df into raw_df
raw_df = raw_df.append(coin_df)
Lastly, a consolidated dataset called raw_df is merged.

Cryptocurrency Dataframe
Finally, we can transform the data into the cryptocurrency data frame.
cryptocurrency_df = pd.DataFrame(raw_df['Close'].values.reshape(len(coindesk20_list), -1).transpose(), index=raw_df['Datetime'][:int(len(raw_df) / len(coindesk20_list))], columns=coindesk20_list)
cryptocurrency_df.to_csv('cryptocurrency_df.csv')

It comes to the end of the tutorial. Now you can move to the next part to see how to analyze the risk and return for the cryptocurrencies. =)