Time Series Forecasting Of Bitcoin Prices Using Prophet

Time-series forecasting, decomposition, cross-validation, and performance evaluation

Prophet is a Python time series forecast library developed by Facebook. Prophet automatically detects yearly, weekly, and daily seasonality. It can quickly decompose the trend and seasonality effects.

In this tutorial, we will make a time-series prediction of Bitcoin prices. The following topics will be covered:

How to train a time series forecasting model using Prophet?
How to make predictions and do time series decomposition?
How to identify changing points in the trend?
How to do time series cross-validation?
How to evaluate time series model performance using Prophet?

Resources for this post:

Video tutorial for this post on YouTube
Python code is at the end of the post. Click here for the notebook.
More video tutorials on time series
More blog posts on time series

The purpose of this tutorial is machine learning education only. It is not investment advice. Therefore, please do not make an investment based on the information in this tutorial.

Let’s get started!

Step 1: Install And Import Libraries

In the first step, we will install and import libraries. Two Python packages need to be installed, yfinance and prophet.

# Install libraries
pip install yfinance prophet

After the package installation, we need to import libraries for this tutorial.

numpy and pandas are for data processing. yfinance is for pulling the data.

Prophet is for building the time series forecast. .plot is for model output visualization, and .diagnostics is for model performance evaluation.

plotly is imported to visualize the Bitcoin price trend.

# Data processing
import numpy as np
import pandas as pd

# Get time series data
import yfinance as yf

# Prophet model for time series forecast
from prophet import Prophet
from prophet.plot import add_changepoints_to_plot, plot_cross_validation_metric
from prophet.diagnostics import cross_validation, performance_metrics

# Visualization
import plotly.graph_objs as go

Join Medium with my referral link - Amy @GrabNGoInfo

Read every story from Amy (and thousands of other writers on Medium). Your membership fee directly supports Amy and…

medium.com

Step 2: Get Bitcoin Price Data

In the 2nd step, the Bitcoin price data is downloaded from the Yahoo finance API. We are using two years’ daily data from 2018 and 2019.

Yahoo finance downloads data with the date as an index. Using reset_index, we created a new index and used the date as a column. This is because Prophet requires the date-time variable to be a column for the model input.

By default, the date is a string type, pd.to_datetime changes it to a DateTime format.

# Download Bitcoin data
data = yf.download(tickers='BTC-USD', start='2018-01-01', end='2019-12-31', interval = '1d')

# Reset index and have date as a column
data.reset_index(inplace=True)

# Change date to datetime format
data['Date'] = pd.to_datetime(data['Date'])

# Take a look at the data
data.head()

Download Bitcoin price data — image from GrabNGoInfo.com

From the trend chart, we can see that the Bitcoin price started to decrease in January 2018 and increased since April 2019.

# Declare a figure
fig = go.Figure()

# Candlestick chart
fig.add_trace(go.Candlestick(x=data.Date,
                open=data['Open'],
                high=data['High'],
                low=data['Low'],
                close=data['Close'], 
                name = 'Bitcoin Data'))

Bitcoin price time-series — image from GrabNGoInfo.com

In this tutorial, we will forecast the Bitcoin close price. Prophet takes two columns as inputs, a datetime column called ds and a value column called y. Therefore, we need to drop all the other columns, rename Date to ds and Close to y.

# Keep only date and close price
df = data.drop(['Open', 'High', 'Low', 'Adj Close', 'Volume'], axis=1)

# Rename date to ds and close price to y
df.rename(columns={'Date': 'ds', 'Close': 'y'}, inplace=True)

# Take a look at the data
df.head()

Prophet model data — Image from GrabNGoInfo.com

Using .info(), we can see that the dataset has 730 records and two columns, ds and y. ds is in DateTime format, and `y' is in float format. There is no missing data in the dataset.

# Data information
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 730 entries, 0 to 729
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   ds      730 non-null    datetime64[ns]
 1   y       730 non-null    float64       
dtypes: datetime64[ns](1), float64(1)
memory usage: 11.5 KB

Step 3: Train Test Split

In step 3, a training and a testing dataset are created. We cannot use random split for time series data because it causes data leakage from the future dates to the past dates. Usually, a cutoff date is selected. The data before the cutoff date is the training dataset, and the data after the cutoff date is used as the testing dataset.

In this example, ‘2019–11–30’ is used as the cutoff date. The first 23 months are used for model training, and the last month is used for testing.

# Train test split
df_train = df[df['ds']<='2019-11-30']
df_test = df[df['ds']>'2019-11-30']

# Print the number of records and date range for training and testing dataset.
print('The training dataset has', len(df_train), 'records, ranging from', df_train['ds'].min(), 'to', df_train['ds'].max())
print('The testing dataset has', len(df_test), 'records, ranging from', df_test['ds'].min(), 'to', df_test['ds'].max())

Output

The training dataset has 699 records, ranging from 2018-01-01 00:00:00 to 2019-11-30 00:00:00
The testing dataset has 31 records, ranging from 2019-12-01 00:00:00 to 2019-12-31 00:00:00

Step 4: Train Time Series Model Using Prophet

In step 4, we will train the time series model using the training dataset.

interval_width specifies the prediction interval. We changed the default value of 80% to 95% prediction interval. It makes the upper bound and the lower bound of the prediction broader.

n_changepoints is the number of change points in the time series trend. The default value is 25. Based on the shape of the Bitcoin price data, it was set at 7.

# Create the prophet model with confidence internal of 95%
m = Prophet(interval_width=0.95, n_changepoints=7)

# Fit the model using the training dataset
m.fit(df_train)

The yearly seasonality and daily seasonality are automatically disabled. This is because Prophet detects that the dataset we are using does not have full multiple years of data and does not have units smaller than a day.

INFO:prophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
WARNING:prophet.models:Optimization terminated abnormally. Falling back to Newton.
<prophet.forecaster.Prophet at 0x7fd5632e9f90>

Step 5: Use Prophet Model To Make Prediction

Step 5 uses the trained Prophet model to make the prediction. We use the last 31 days to create the future dataframe. This is the same as using the testing dataset we created above.

The prediction output contains lots of information. We kept the predicted value yhat and its prediction interval upper and lower bound value.

# Create a future dataframe for prediction
future = m.make_future_dataframe(periods=31)

# Forecast the future dataframe values
forecast = m.predict(future)

# Check the forecasted values and upper/lower bound
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()

Prophet time-series forecast — Image from GrabNGoInfo.com

The x-axis is the date in the forecast visualization, and the y axis is the Bitcoin close price. The black dots are the actual prices in the training dataset, and the red dots are the actual forecast prices. The blue line is the time series model prediction. The shaded area is the 95% prediction interval.

# Visualize the forecast
fig = m.plot(forecast)
ax = fig.gca()
ax.plot( df_test["ds"], df_test["y"], 'r.')

Prophet Time-series Prediction — Image from GrabNGoInfo

Step 6: Time Series Decomposition

In step 6, we will decompose the time series forecast.

From the trend chart, we can see a decreasing trend from early 2018 to early 2019, an increasing trend from April 2019 to July 2019, and a decreasing trend after July 2019.

# Visualize the components
m.plot_components(forecast);

The weekly seasonality chart shows that the Bitcoin prices decrease starting Monday and reach the lowest on Thursday. Then the prices started to increase and get the highest on Saturday.

Time-series Decomposition — Image from GrabNGoInfo.com

Step 7: Identify Change Points

In step 7, we will discuss how to identify the change points in the time series trend.

Prophet automatically identifies the change points in time series data following the steps below:

Use the data in the first 80% of the time series for identifying change points. The default is 80% but it is a hyperparameter that we can change.
Identify a large number of uniformly distributed dates with possible trajectories change.
Apply a sparse prior on the magnitudes of the change rate, which is similar to L1 regularization.

We can list the dates corresponding to the changepoints using .changepoints.

# Default change points
print(f'There are {len(m.changepoints)} change points. \nThe change points dates are \n{df.loc[df["ds"].isin(m.changepoints)]}')

Output

There are 7 change points. 
The change points dates are 
            ds             y
80  2018-03-22   8728.469727
159 2018-06-09   7531.979980
239 2018-08-28   7096.279785
319 2018-11-16   5575.549805
399 2019-02-04   3459.154053
478 2019-04-24   5464.866699
558 2019-07-13  11392.378906

In the visualization, the red dotted lines represent the changepoints. It does not include all the seven changepoints in the chart. Only the ones with more changes are included.

# Change points to plot
fig = m.plot(forecast)
a = add_changepoints_to_plot(fig.gca(), m, forecast)

Time-series Change Points — Image from GrabNGoInfo.com

Step 8: Cross-Validation

In step 8, we will do cross-validation for the time series model. Prophet has a cross_validation function to automate the comparison between the actual and the predicted values.

m is the trained model.
initial='500 days' means the initial model will be trained on the first 500 days of data.
period='60 days' means 60 days will be added to the training dataset for each additional model.
horizon = '30 days' means that the model forecasts the next 30 days. When only horizon is given, Prophet defaults initial to be triple the horizon, and period to be half of the horizon.
parallel="processes" enables parallel processing for cross-validation. When the parallel cross-validation can be done on a single machine, processes provide the highest performance. For larger problems, dask can be used to do cross-validation on multiple machines.

# Cross validation
df_cv = cross_validation(m, initial='500 days', period='60 days', horizon = '30 days', parallel="processes")
df_cv.head()

There are 730 days in the dataset. Therefore, after setting the 500 days for the initial model training and 60 days period, there are enough data to train three models and forecast the next 30 days.

INFO:prophet:Making 3 forecasts with cutoffs between 2019-07-03 00:00:00 and 2019-10-31 00:00:00
INFO:prophet:Applying in parallel with <concurrent.futures.process.ProcessPoolExecutor object at 0x7fd562df9e90>

Time-series Cross-validation — Image from GrabNGoInfo.com

Step 9: Prophet Model Performance Evaluation

Step 9 evaluates the cross-validation model performance.

MSE (Mean Squared Error) sums up the squared difference between actual and prediction and is divided by the number of predictions.
RMSE (Root Mean Square Error) takes the square root of MSE.
MAE (Mean Absolute Error) sums up the absolute difference between actual and prediction and is divided by the number of predictions.
MAPE (Mean Absolute Percentage Error) sums up the absolute percentage difference between actual and prediction and is divided by the number of predictions. MAPE is independent of the magnitude of data, so it can be used to compare different forecasts. But it’s undefined when the actual value is zero.
MDAPE (Median Absolute Percentage Error) is similar to MAPE. The difference is that it calculates the median instead of taking the average of the absolute percentage difference.
SMAPE (Symmetric Mean Absolute Percentage Error) is similar to MAPE. The difference is that when calculating absolute percentage error, the denominator is the actual value for MAPE and the average of the actual and predicted value for SMAPE.

# Model performance metrics
df_p = performance_metrics(df_cv)
df_p.head()

e Series Forecast Performance Metrics — Image from GrabNGoInfo.com

plot_cross_validation_metric method from Prophet helps us to plot the cross-validation performance results.

The x-axis is the horizon. Because we set the horizon to be 30 days, the x-axis has a value up to 30.
The y-axis is the metric we are interested in. We use mape as an example in this visualization.
On each day, we can see three dots. This is because there are three models in the cross-validation, and each dot represents the MAPE from one model.
The line is the aggregated performance across all the models. We can see that MAPE value increases with days, which is expected because time series tend to make better predictions for the near future than the far future.

# Visualize the performance metrics
fig = plot_cross_validation_metric(df_cv, metric='mape')

Time-series Plot Cross-validation Metrics — Image from GrabNGoInfo.com

Summary

In this tutorial, we talked about how to make time-series predictions of Bitcoin prices. The following topics are covered:

How to train a time series forecasting model using Prophet?
How to make predictions and do time series decomposition?
How to identify changing points in the trend?
How to do time series cross-validation?
How to evaluate time series model performance using Prophet?

More tutorials are available on GrabNGoInfo YouTube Channel and GrabNGoInfo.com.

Reference

Join Medium with my referral link - Amy GrabNGoInfo

As a Medium member, a portion of your membership fee goes to writers you read, and you get full access to every story…