avatarMatt Przybyla

Summary

This context discusses three open-source cryptocurrency datasets available on Kaggle for practicing data science and machine learning algorithms.

Abstract

The article "Top Cryptocurrency Datasets for Data Scientists" introduces three cryptocurrency datasets suitable for practicing data science and machine learning algorithms. The first dataset, TOP 50 Cryptocurrencies Historical Prices, includes open, high, low, and close prices per day for the top 50 cryptocurrencies. The second dataset, Top 10 Cryptocurrencies Historical Dataset, focuses on the top 10 cryptocurrencies with more recent data. The third dataset, Historical data on the trading of cryptocurrencies, contains additional columns like Market Cap, Capitalization Change 1 Day, BTC Price Change 1 Day, and Crypto Type. The article provides links to the datasets and suggests potential use cases for each dataset.

Bullet points

  • The article discusses three cryptocurrency datasets for practicing data science and machine learning algorithms.
  • The first dataset, TOP 50 Cryptocurrencies Historical Prices, includes open, high, low, and close prices per day for the top 50 cryptocurrencies.
  • The second dataset, Top 10 Cryptocurrencies Historical Dataset, focuses on the top 10 cryptocurrencies with more recent data.
  • The third dataset, Historical data on the trading of cryptocurrencies, contains additional columns like Market Cap, Capitalization Change 1 Day, BTC Price Change 1 Day, and Crypto Type.
  • The article provides links to the datasets and suggests potential use cases for each dataset.

Opinion

Top Cryptocurrency Datasets for Data Scientists

Kaggle datasets for practicing data science and machine learning algorithms

Photo by Austin Distel on Unsplash [1].

Table of Contents

  1. Introduction
  2. TOP 50 Cryptocurrencies Historical Prices
  3. Top 10 Cryptocurrencies Historical Dataset
  4. Historical data on the trading of cryptocurrencies
  5. Summary
  6. References

Introduction

The purpose of this article is to discuss information around three datasets regarding cryptocurrency, so that you can practice data science with a more relevant topic and its data. Of course, as a disclaimer, this is not financial advice, this article is simply an aggregation of usable datasets that are open-source for you to play around with. A lot of the time, you will start learning data science with the S&P500 stocks dataset, but with the popular emergence of crypto, it is time to include more discussion around this type of data. With that being said, let’s look closer into these datasets so you can know what to use for your next project.

TOP 50 Cryptocurrencies Historical Prices

Photo by Markus Winkler on Unsplash [2].

This first dataset actually is more of a combination of datasets, 50, to be exact. Like typical stock datasets, this one includes the expected open, high, low, and close prices per day. The great part about this dataset is that it also has a combined dataset, in addition to separate company datasets, if you would like to focus on crypto in general, versus a specific company. All of these datasets are good for practicing time-series problems and predictions.

Here is the link to this dataset: TOP 50 Cryptocurrencies Historical Prices [3]

The dataset contains the following columns (along with an example value):

  • Currency Name (Aave — also known as a serial number)
  • Date (2018–01–30)
  • Price (0.15 — also known as closing price)
  • Open (0.17)
  • High (0.17)
  • Low (0.14)
  • Vol. (530470.0)
  • Change % (-7.95)

Date range: 2017–07–10 to 2021–08–23

The dataset could be useful by practicing predicting a numeric, continuous value, like the price, either at open, high, low. You could also look at predicting the volume of transactions in a data, or the percentage change from the previous day as an exercise. This dataset has great values already, so most likely, you will not need to perform any transformations.

Top 10 Cryptocurrencies Historical Dataset

Photo by Christopher Burns on Unsplash [4].

This next dataset is similar, but contains fewer cryptos/companies, while also having the benefit of more updated information. At the time of this article, the most recent date is November 2nd, 2021, which is about two months more up-to-date than the previous dataset. This data is separated by different CSV’s for each cryptocurrency, so, if you want to combine them all, you will need to make note of the name from the file, and create that as an index or a new column.

Here is the link to this dataset: Top 10 Cryptocurrencies Historical Dataset [5]

The dataset contains the following columns (along with an example value):

  • Date ( Nov 02, 2021)
  • Price (560.48)
  • Open (551.5)
  • High (563.7)
  • Low (537.62)
  • Volume (1.37M)
  • Change in price (1.62%)

Date range: 2017–11–08 to 2021–11–01

As you can imagine, you could perform similar tasks with this previous dataset as you could with this one. You could also predict the high and low prices of the day, or create a new feature, which could be the total change from high to low of a given day. With this dataset, you might want to transform the values to be easier to work with, like volume and the date, to be in a DateTime format, and a numeric format, respectively.

Historical data on the trading of cryptocurrencies

Photo by Jason Briscoe on Unsplash [6].

This dataset is the least up-to-date, however, it is still pretty recent with the latest data being from two months ago. The dataset is also composed of one CSV file already, which can be a plus.

Here is the link to this dataset: Historical data on the trading of cryptocurrencies [7]

The dataset contains the following columns (along with an example value):

  • Trade Date (2016–01–01)
  • Volume (36278900)
  • Price USD (434.33)
  • Price BTC (1)
  • Market Cap (6529299589)
  • Capitalization Change 1 Day (-0.0018239478580617)
  • USD Price Change 1 Day (-0.0020491331476066)
  • BTC Price Change 1 Day (0)
  • Crypto Name (Bitcoin)
  • Crypto Type (0)

Date range: 2016–01–01 to 2021–08–09

As you can see, there are a few more columns in this dataset when compared to the other two in this article. Not only that, but there are also new columns in general, specifically the Market Cap, Capitalization Change 1 Day, BTC Price Change 1 Day, and Crypto Type. With these new columns, you could use either of them as your target variable. The value data types look great as well, and you will most likely not need to transform them.

Summary

Overall, there are always going to be a ton of datasets, whether it is traditional stocks or cryptocurrencies, but I hope this aggregation and summary can help you quickly decide on a dataset to use. Depending on what you are wanting to predict, or the data you should have to transform or not transform, these are all still great datasets for practicing data science modeling.

To summarize, here are the three names of the cryptocurrency datasets:

* TOP 50 Cryptocurrencies Historical Prices
* Top 10 Cryptocurrencies Historical Dataset
* Historical data on the trading of cryptocurrencies

I hope you found my article both interesting and useful. Please feel free to comment down below if you agree or disagree with these datasets that I included. Why or why not? What other datasets do you think are important to include? These can certainly be clarified even further, but I hope I was able to shed some light on some interesting datasets for cryptocurrency data. Thank you for reading!

I am not affiliated with any of these companies.

Please feel free to check out my profile, Matt Przybyla, and other articles, as well as subscribe to receive email notifications for my blogs by following the link below, or by clicking on the subscribe icon on the top of the screen by the follow icon, and reach out to me on LinkedIn if you have any questions or comments.

Subscribe link: https://datascience2.medium.com/subscribe

References

[1] Photo by Austin Distel on Unsplash, (2019)

[2] Photo by Markus Winkler on Unsplash, (2020)

[3] Kaggle and Sanskar Hasija, TOP 50 Cryptocurrencies Historical Prices, (2021)

[4] Photo by Christopher Burns on Unsplash, (2017)

[5] Kaggle and Kash, Top 10 Cryptocurrencies Historical Dataset, (2021)

[6] Photo by Jason Briscoe on Unsplash, (2019)

[7] Kaggle and George Zakharov, Historical data on the trading of cryptocurrencies, (2021)

Data Science
Machine Learning
Cryptocurrency
Technology
Algorithms
Recommended from ReadMedium