avatarMichelangiolo Mazzeschi

Summary

The website content outlines a comprehensive guide on predicting Bitcoin prices using machine learning, specifically Long Short-Term Memory (LSTM) networks, with data mining from blockchain.com and subsequent data processing and analysis.

Abstract

The content provided is a detailed tutorial on using machine learning to forecast Bitcoin prices. It begins by emphasizing the unique nature of Bitcoin's price, which is influenced by demand and supply rather than external factors, making pattern recognition crucial for predictive analysis. The author breaks down the process into three parts: the first part covers data mining from blockchain.com, the second part describes setting up LSTM for price prediction, and the third part will discuss the LSTM model's accuracy. The tutorial includes importing necessary libraries, defining functions for data manipulation, and instructions on downloading and parsing JSON data from blockchain.com. It also explains how to preprocess and group the data by weeks, merge different datasets, and prepare the data for LSTM analysis, which will be executed in the subsequent parts of the article.

Opinions

  • The author believes that Bitcoin's price is heavily influenced by perceived trends rather than traditional economic indicators.
  • The author suggests that pattern recognition through machine learning, particularly with LSTM networks, is a suitable approach for Bitcoin price prediction.
  • Manual intervention is minimized in the code, with the only significant manual step being the specification of the initial week for data counting.
  • The author provides a function to download data from blockchain.com without the need for an API token, highlighting the ease of access to Bitcoin data.
  • The tutorial is designed to be elaborate, offering the flexibility to download and parse various charts available on blockchain.com, even though only one variable (market price) will be used for the LSTM analysis.
  • The author acknowledges the absence of some graphs on the website and provides a method to manually download and import JSON data for those cases.
  • The author emphasizes the importance of merging different datasets using weeks as a common factor to facilitate a comprehensive analysis.
  • The content reflects the author's expertise in handling and processing Bitcoin data for machine learning applications, particularly for LSTM-based price predictions.

Machine Learning, Recurrent Neural Networks

Bitcoin Price Prediction with LSTM using Q Blocks (Part I)

Mining Bitcoin Data. Full code available at my Github repository.

Bitcoin is a very particular asset. Its price is sensible to demand and supply rather than external factors, so it may highly depend on perceived trends rather than perceived information. For this category of problems, pattern recognition may prove incredibly useful.

Part I (this article)

Because this problem is very big, from beginning to end, I will begin with the first part of the article by Mining Bitcoin Data.

Part II (next article)

In the second part, I will be setting up LSTM to predict the upcoming Bitcoin price. Because the necessary computing power to run the code won’t be enough for a regular notebook, I will be renting computing power from Q Blocks. In the next article, I will reserve a full part on how to connect to their notebook to speed up your performances using peer to peer distributed GPU.

Part III

After having set everything to run on the Q Blocks platform, I will be explaining in detail how the LSTM model works and estimate the accuracy of the final model.

Importing Libraries

import json
import urllib.request
import pandas as pd

Importing Functions

With the following functions I will be able to decode UTF-8 data and group the data into chunks (weeks, specifically). For now, I won’t be entering into details, but with further code will require the use of those two main functions.

def group_chunks(df, id_loc, value_loc): #df, 0, 1
  def average(list1):
    sum1 = 0
    for _ in list1:
      sum1 += _
    return sum1/len(list1)
#convert DataFrame into a dict with a unique value per timestamp
  mydict = {}
  for x in range(len(df)):
    currentid = df.iloc[x,id_loc]
    currentvalue = df.iloc[x,value_loc]
    mydict.setdefault(currentid, [])
    mydict[currentid].append(currentvalue)
  mydict
#convert dict into a list
  dictlist = list()
  for key, value in mydict.items():
    temp = [key,value]
    dictlist.append(temp)
#convert to DataFrame
  dictlist = pd.DataFrame(dictlist)
  dictlist
#average of multiple values
  dictlist[1] = dictlist[1].apply(lambda x : average(x))
  return dictlist

Specifying initial week

The only manual intervention you will have to do in this code is to specify when the weeks have to be counted. Because I am downloading data for the last 5 years, I will specify 2015 as the initial date.

#***cambiare questa per determinare inizio conteggio settimana
benchmark_date = 2015
import datetime
def convert_week(date):
  year = int(date[0:4])
  return datetime.datetime.strptime(date, '%Y-%m-%d').isocalendar()[1] + (year-benchmark_date)*52
convert_week('2018-01-01')
convert_week('2019-01-01')
def convert_week_back(week_n):
  year_week = int(week_n/52)
  year = benchmark_date + year_week
  d = str(year)+'-W'+str(week_n%52)
  r = datetime.datetime.strptime(d + '-1', "%Y-W%W-%w")
  return r
convert_week_back(129)

Downloading Function

For my experience, Bitcoin data is easily obtainable from the internet, actually from multiple sources. The issue is that most of the data have to be downloaded manually, and the rest is only available through the use of an API.

I will instruct you specifically on how to build a function that connects to https://www.blockchain.com/ to extract all the relevant data.

sample of the available data on the website

***No registration required, the data is publicly available with .json format. Therefore, you won’t be needing to make a GET request using any API Token.

Because I had to use this algorithm for a different project, it is more elaborate than what we actually need. For the LSTM I will be only using one variable (market price), while, with this function, you can automatically download and parse all of the charts available on blockchain.com.

Downloading JSON

def download_data(chart_name, data_=None, compress=True):
  #https://api.blockchain.info/charts/transactions-per-second?timespan=2years&rollingAverage=8hours&format=json
if data_ is None:
    with urllib.request.urlopen('https://api.blockchain.info/charts/' + chart_name + '?timespan=5years&rollingAverage=24hours&format=json') as url:

Parsing JSON

I will use the JSON library to convert it into a dictionary. I will then store each element of this dictionary into a pandas DataFrame.

      response = url.read()
      data = json.loads(response)
  else:
    data = open(data_)
    data = json.load(data)
  
  import pandas as pd
  data = pd.DataFrame(data)
  data

After the conversion into a pandas dataset, this is the result:

Extracting time and values

The data is UTF-8 encoded. I need to decode it, to extract both the timestamp and the market price (or any other value we have been downloading from the website).

#make backup
  data_copy = data.copy()
def extract_time(dict1):
    dict2 = list(dict1.values())
    #dict2 = list(data['values'][0].values())[0],
    return dict2
def time_converter(time1):
    from datetime import datetime
    #return datetime.fromtimestamp(time1).strftime('%Y-%m-%d %H:%M:%S')
    return datetime.fromtimestamp(time1).strftime('%Y-%m-%d')
#preprocessing data
  data = data_copy.copy()
  data['timestamp'] = data['values'].apply(lambda x : extract_time(x)[0])
  data['timestamp'] = data['timestamp'].apply(lambda x : time_converter(x))
  data[chart_name] = data['values'].apply(lambda x : extract_time(x)[1])
  data.pop('values')
  data
df = data.drop(['status', 'name', 'unit', 'period', 'description'], axis=1)
  df

So far, the function will allow me to extract data day by day. However, if I want to push it further and group it by weeks, I will need to take the average of the data within the period of time to obtain a valid metric.

if compress == True:
    #group by date
    df = group_chunks(df, 0, 1)
    #df = df.drop(df.index[[730]])
    df
#apply weeks
    df['week'] = df[0].apply(lambda x : convert_week(x))
    df.pop(0)
    df
#group by week
    df = group_chunks(df, 1, 0)
    df
return df

Downloading

As you can see, there are multiple parameters you can play with, depending on the output you wish to obtain.

Data grouped by day

market_cap = download_data('market-cap', compress=False)
market_cap.columns = ['day', 'market_cap']
market_cap

Data grouped by week

#download data compressed by week
market_cap = download_data('market-cap')

Data from downloaded JSON

Unfortunately, I found a few missing graphs from the website. Transactions Fees per USD is an example. In case you wish to download similar data, you can download directly (but manually) the son file:

When importing it into your NoteBook, use the following setting to extract data without performing a GET request.

#importing data by json, then compressing
#transaction_fees = download_data('transaction-fees-usd') #url not working
transaction_fees = download_data(0, data_='/content/drive/My Drive/Colab Notebooks/Projects_Work/20200617_Bitcoin/fees-usd-per-transaction.json')
transaction_fees.columns = ['week', 'transaction_fees']
transaction_fees

Because weeks will start counting from 2015, and I have downloaded earlier data, the week will be negative. You can delete the rows which have a negative week if you wish to discard them.

Merging Datasets

In my case, I have downloaded several different chunks of data. I will need to merge them together. I will use Weeks as a common factor that will let me group them all.

#tps, market_cap, total_bitcoins, market_price
result = pd.merge(tps, market_cap, on=0)
result = pd.merge(result, total_bitcoins, on=0)
result = pd.merge(result, market_price, on=0)
result = pd.merge(result, miners_revenue, on=0)
result = pd.merge(result, transaction_fees, on=0)
result
result.columns = ['week', 'tps', 'market_cap', 'total_bitcoins', 'market_price', 'miners_revenue', 'transaction_fees']
result

Exporting Data

market_price.to_csv('bitcoin.csv')

Part II

We are all set for part II…

--> Go to part II (under construction)

Recurrent Neural Network
Artificial Intelligence
Machine Learning
Mining
Bitcoin
Recommended from ReadMedium