Multivariate Time Series Forecasting with Seasonality and Holiday Effect Using Prophet in Python
How the time series model performance is impacted by seasonalities, holidays, special events, and additional features
Do you want to build a time series model that incorporates seasonalities, holidays, special events, and other features? In this tutorial, we will talk about how to achieve this using Facebook Prophet in Python. After the tutorial, you will learn:
- How to include seasonalities in time series prediction?
- How to add standard holidays of a country?
- How to add special events such as Super Bowl?
- How to add additional predictors?
- How the time series model performance is impacted by seasonalities, holidays, special events, and additional features?
Resources for this post:
- Video tutorial for this post on YouTube
- Python code is at the end of the post. Click here for the notebook.
- More video tutorials on time-series forecasting
- More blog posts on time-series forecasting
Let’s get started!
Step 1: Install and Import Libraries
In the first step, we will install and import libraries.
yfinance
is the python package for pulling stock data from Yahoo Finance. prophet
is the package for the time series model. After installing yfinance
and prophet
, we will import them into the notebook.
We will also import pandas
and numpy
for data processing, seaborn
and matplotlib
for visualization, and mean_absolute_error
and mean_absolute_percentage_error
for the model performance evaluation.
# Install libraries
!pip install yfinance prophet
# Get time series data
import yfinance as yf
# Prophet model for time series forecast
from prophet import Prophet
# Data processing
import numpy as np
import pandas as pd
# Visualization
import seaborn as sns
import matplotlib.pyplot as plt
# Model performance evaluation
from sklearn.metrics import mean_absolute_error, mean_absolute_percentage_error
Step 2: Pull Data
The second step pulls stock data from Yahoo Finance API. We will pull 2 years of daily data from the beginning of 2020 to the end of 2021.
start_date = '2020-01-02'
because January 1st is a holiday, and there is no stock data on holidays and weekends.end_date = '2022-01-01'
becauseyfinance
excludes the end date, so we need to add one day to the last day of the data end date.train_end_date = '2021-12-15'
means that the splitting date for the training and testing dataset is December 15th of 2021.
# Data start date
start_date = '2020-01-02'
# Data end date. yfinance excludes the end date, so we need to add one day to the last day of data
end_date = '2022-01-01'
# Date for splitting training and testing dataset
train_end_date = '2021-12-15'
We will download the closing prices for two tickers, GOOG
and VTI
. GOOG
is the ticker for Google, and VTI
is the ticker for the Vanguard Total Stock Market Index Fund ETF.
The goal of the time series model is to predict the closing price of Google stock. The closing price of VTI will be used as an additional predictor.
# Pull close data from Yahoo Finance for the list of tickers
ticker_list = ['GOOG', 'VTI']
data = yf.download(ticker_list, start=start_date, end=end_date)[['Close']]
# Take a look at the data
data.head()

The downloaded dataset is in the pandas dataframe format. We can see that the output has two-level column names. Using droplevel
, the first level column name Close
is dropped.
# Drop one level of the column names
data.columns = data.columns.droplevel(0)
# Take a look at the data
data.head()

Using .info
, we can see that the dataset has 505 records and there are no missing values.
# Information on the dataframe
data.info()
Output
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 505 entries, 2020-01-02 to 2021-12-31
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 GOOG 505 non-null float64
1 VTI 505 non-null float64
dtypes: float64(2)
memory usage: 11.8 KB
Next, let’s visualize the closing prices of the two tickers using seaborn
, and add the legend to the plot using matplotlib
.
# Visualize data using seaborn
sns.set(rc={'figure.figsize':(12,8)})
sns.lineplot(x=data.index, y=data['GOOG'])
sns.lineplot(x=data.index, y=data['VTI'])
plt.legend(['Google', 'VTI'])
We can see that the price for Google increased a lot starting in late 2020, and almost doubled in late 2021.

Step 3: Data Processing
Step 3 transforms the dataset into a time series modeling dataset.
Prophet requires at least two columns as inputs: a ds
column and a y
column.
- The
ds
column has the time information. Currently we have the date as the index, so we reset the index and renamedate
tods
. - The y column has the time series values. In this example, because we are predicting the Google closing price, the column name
GOOG
is changed toy
. - There is no pre-defined name for the additional predictor in prophet, so we can keep the name
VTI
as is.
# Change variable names
data = data.reset_index()
data.columns = ['ds', 'y', 'VTI']
# Take a look at the data
data.head()

Next, let’s check the correlation between the Google stock closing price and the VTI closing price.
# Check correlation
data.corrwith(data["y"])
Output
y 1.000000
VTI 0.967116
dtype: float64
The value of 0.967 is close to 1, indicating that there is a high correlation between the two closing prices. Therefore, VTI closing price is a potential good predictor for the Google closing price.
Step 4: Train Test Split
In step 4, we will do the train test split. For time series data, usually a threshold date is chosen, then we set the dates before the threshold to be the training dataset and the dates after the threshold to be the testing dataset.
Based on the threshold date (train_end_date
) we set before, there are 494 data points in the training dataset and 11 data points in the testing dataset.
# Train test split
train = data[data['ds'] <= train_end_date]
test = data[data['ds'] > train_end_date]
# Check the shape of the dataset
print(train.shape)
print(test.shape)
Output
(494, 3)
(11, 3)
Checking the minimum and maximum values for the train and test dataset separately gave us the starting and ending dates.
The training dataset has the dates from January 2nd, 2020 to December 15th, 2021, and the testing dataset has the dates from December 16th, 2021 to December 31, 2021.
# Check the start and end time of the training and testing dataset
print('The start time of the training dataset is ', train['ds'].min())
print('The end time of the training dataset is ', train['ds'].max())
print('The start time of the testing dataset is ', test['ds'].min())
print('The end time of the testing dataset is ', test['ds'].max())
Step 5: Baseline Model
In step 5, we will build a univariate baseline model using the default prophet hyperparameters, and fit the model using the training dataset.
# Use the default hyperparameters to initiate the Prophet model
model_baseline = Prophet()
# Fit the model on the training dataset
model_baseline.fit(train)
Prophet automatically fits daily, weekly, and yearly seasonalities if the time series is more than two cycles long.
The model information shows that the yearly seasonality and the daily seasonality are disabled.
- The daily seasonality is disabled because we do not have sub-daily time series.
- The yearly seasonality is disabled because we do not have two full years of data in the training dataset.
INFO:prophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
<prophet.forecaster.Prophet at 0x7fa2c4105610>
We will continue with the default values for the baseline model and force the yearly seasonality in the next model to see the impact of the yearly seasonality.
To make a forecast, we first need to create a future dataframe. periods=16
means that we would like to make predictions for the next 16 days.
If you remember that the testing dataset has 11 data points, you might wonder why we put 16 instead of 11 as the number of periods here. That is because there are 5 weekends and holidays from December 16th to December 31st in 2021. We need to add those days back in order to get the predictions till December 31st of 2021.
# Create the time range for the forecast
future_baseline = model_baseline.make_future_dataframe(periods=16)
# Make prediction
forecast_baseline = model_baseline.predict(future_baseline)
# Visualize the forecast
model_baseline.plot(forecast_baseline); # Add semi-colon to remove the duplicated chart
After making the prediction on the future dataframe, we can plot the result using .plot
.
- The black dots are the actual values.
- The blue line is the prediction.
- The blue shades are the uncertainty interval. The default value for the uncertainty interval is 80%, so we are using 80% here. The uncertainty interval is calculated based on the assumption that the average frequency and magnitude of trend changes in the future will be the same as the historical data. The historical data trend changes are projected forward to get the uncertainty intervals [1].

In addition to the forecast plot, prophet also provides the components plot.
From the component plot chart, we can see that the Google stock closing price has an overall upward trend. The weekly seasonality shows that the price tends to be lower at the beginning of the week and higher at the end of the week.
# Visualize the forecast components
model_baseline.plot_components(forecast_baseline);

Next, let’s check the model performance.
The forecast dataframe does not include the actual values, so we need to merge the forecast dataframe with the test dataframe to compare the actual values with the predicted values.
# Merge actual and predicted values
performance_baseline = pd.merge(test, forecast_baseline[['ds', 'yhat', 'yhat_lower', 'yhat_upper']][-16:], on='ds')
# Check MAE value
performance_baseline_MAE = mean_absolute_error(performance_baseline['y'], performance_baseline['yhat'])
print(f'The MAE for the baseline model is {performance_baseline_MAE}')
# Check MAPE value
performance_baseline_MAPE = mean_absolute_percentage_error(performance_baseline['y'], performance_baseline['yhat'])
print(f'The MAPE for the baseline model is {performance_baseline_MAPE}')
The mean absolute error (MAE) for the baseline model is $84, meaning that on average, the forecast is off by $84. Given the Google price of nearly $3000, the prediction is not bad.
The mean absolute percent error (MAPE) for the baseline model is 2.9%, meaning that on average, the forecast is off by 2.9% of the stock price.
The MAE for the baseline model is 84.12552420889001
The MAPE for the baseline model is 0.02904474676076865
Step 6: Add Seasonality to Baseline Model
The baseline model already gives us good estimations. Can we tune the model to make the estimations better? In step 6, we will force the model to consider the yearly seasonality.
When initiating the prophet model, the yearly_seasonality
and weekly_seasonality
are explicitly set to True, and then fit on the training data.
# Add seasonality
model_season = Prophet(yearly_seasonality=True, weekly_seasonality=True)
# Fit the model on the training dataset
model_season.fit(train)
The forecast plot is much better than the baseline model. The predictions are more aligned with the actual values, and the uncertainty interval is much narrower.
# Create the time range for the forecast
future_season = model_season.make_future_dataframe(periods=16)
# Make prediction
forecast_season = model_season.predict(future_season)
# Visualize the forecast
model_season.plot(forecast_season); # Add semi-colon to remove the duplicated chart

For the components plot, besides trend and weekly seasonality, there is a yearly seasonality chart, showing the seasonality over the year.
# Visualize the forecast components
model_season.plot_components(forecast_season);

The model performance is better than the baseline model.
- MAE decreased to $60 from the baseline model’s $84.
- MAPE decreased to 2% from the baseline model’s 2.9%.
# Merge actual and predicted values
performance_season = pd.merge(test, forecast_season[['ds', 'yhat', 'yhat_lower', 'yhat_upper']][-16:], on='ds')
# Check MAE value
performance_season_MAE = mean_absolute_error(performance_season['y'], performance_season['yhat'])
print(f'The MAE for the seasonality model is {performance_season_MAE}')
# Check MAPE value
performance_season_MAPE = mean_absolute_percentage_error(performance_season['y'], performance_season['yhat'])
print(f'The MAPE for the seasonality model is {performance_season_MAPE}')
Output
The MAE for the seasonality model is 60.64578181306512
The MAPE for the seasonality model is 0.020749685375986347
Step 7: Multivariate Model
In step 7, we added the VTI price as an additional predictor using the add_regressor
function. standardize=False
means the regressor will not be standardized.
VTI is chosen as a convenient example of illustrating the process of building a multivariate model. In practice, multiple features need to be created and evaluated as feature engineering and feature selection steps for the model.
# Add seasonality
model_multivariate = Prophet(yearly_seasonality=True, weekly_seasonality=True)
# Add regressor
model_multivariate.add_regressor('VTI', standardize=False)
# Fit the model on the training dataset
model_multivariate.fit(train)
When making forecasts for the multivariate model, we need to make sure that the regressors have values for the forecast periods, so we used left join and appended VTI data to the future dataframe.
In the case that the forecast is for the future without the regressor data, separate models need to be built for the regressors to get the predictions for the future dates.
If there are missing values, we can use .fillna(method='ffill')
to fill the missing value with the previous day's data.
# Create the time range for the forecast
future_multivariate = model_multivariate.make_future_dataframe(periods=16)
# Append the regressor values
future_multivariate = pd.merge(future_multivariate, data[['ds', 'VTI']], on='ds', how='inner')
# Fill the missing values with the previous value
future_multivariate = future_multivariate.fillna(method='ffill')
# Check the data
future_multivariate.tail(10)

Compared with the model with no additional regressor, we can see that the model with a regressor has the predictions more aligned with the actual values.
# Make prediction
forecast_multivariate = model_multivariate.predict(future_multivariate)
# Visualize the forecast
model_multivariate.plot(forecast_multivariate); # Add semi-colon to remove the duplicated chart

The components plot has one additional chart for the additional regressor.
# Visualize the forecast components
model_multivariate.plot_components(forecast_multivariate);

The multivariate model performance is much better than the univariate model.
- MAE decreased to $31 from the univariate seasonality model’s $60.
- MAPE decreased to 1% from the univariate seasonality model’s 2%.
# Merge actual and predicted values
performance_multivariate = pd.merge(test, forecast_multivariate[['ds', 'yhat', 'yhat_lower', 'yhat_upper']][-16:], on='ds')
# Check MAE value
performance_multivariate_MAE = mean_absolute_error(performance_multivariate['y'], performance_multivariate['yhat'])
print(f'The MAE for the multivariate model is {performance_multivariate_MAE}')
# Check MAPE value
performance_multivariate_MAPE = mean_absolute_percentage_error(performance_multivariate['y'], performance_multivariate['yhat'])
print(f'The MAPE for the multivariate model is {performance_multivariate_MAPE}')
Output
The MAE for the multivariate model is 31.372078294947972
The MAPE for the multivariate model is 0.010728081175901991
Step 8: Model with Holiday Effect and Event Effect
Besides seasonalities and additional predictors, we can also incorporate holidays and special events in a prophet time series model. In step 8, we will add special events and US holidays to the multivariate model and see how it impacts the model predictions.
Firstly, let’s create two special events: COVID start and Super Bowl.
- For the COVID start event, we set the date to be March 15th of 2020, then extend the event to 15 days before and 15 days after using
lower_window
andupper_window
separately. - For the Super Bowl event, we set the date in 2020 and 2021 separately, and extend the event to 7 days before and 1 day after.
Then the two events are concatenated together to one dataframe called events
.
# COVID time window
COVID = pd.DataFrame({
'holiday': 'COVID',
'ds': pd.to_datetime(['2020-03-15']),
'lower_window': -15,
'upper_window': 15,
})
# Super Bowl time window
superbowl = pd.DataFrame({
'holiday': 'superbowl',
'ds': pd.to_datetime(['2020-02-02', '2021-02-07']),
'lower_window': -7,
'upper_window': 1,
})
# Combine all events
events = pd.concat((COVID, superbowl))
# Take a look at the events data
events

When initiating the prophet model, the dataframe name events
is passed to the hyperparameter holidays
.
Prophet has built-in country-specific holidays. We add US holidays to the model using add_country_holidays
and setting country_name='US'
.
# Add holidays
model_holiday = Prophet(yearly_seasonality=True, weekly_seasonality=True, holidays=events)
# Add built-in country-specific holidays
model_holiday.add_country_holidays(country_name='US')
# Add regressor
model_holiday.add_regressor('VTI', standardize=False)
# Fit the model on the training dataset
model_holiday.fit(train)
# All the holidays and events
model_holiday.train_holiday_names
Using .train_holiday_names
, we can see the list of holidays and events used for the modeling fitting.
0 COVID
1 superbowl
2 New Year's Day
3 Martin Luther King Jr. Day
4 Washington's Birthday
5 Memorial Day
6 Independence Day
7 Independence Day (Observed)
8 Labor Day
9 Columbus Day
10 Veterans Day
11 Thanksgiving
12 Christmas Day
13 New Year's Day (Observed)
14 Christmas Day (Observed)
When doing a forecast, the holiday and special events effects are included.
# Create the time range for the forecast
future_holiday = model_holiday.make_future_dataframe(periods=16)
# Append the regressor values
future_holiday = pd.merge(future_holiday, data[['ds', 'VTI']], on='ds', how='inner')
# Fill the missing values with the previous value
future_holiday = future_holiday.fillna(method='ffill')
# Make prediction
forecast_holiday = model_holiday.predict(future_holiday)
# Visualize the forecast
model_holiday.plot(forecast_holiday); # Add semi-colon to remove the duplicated chart

The components plot has one additional plot for the holidays including special events.
# Visualize the forecast components
model_holiday.plot_components(forecast_holiday);

The added holiday effects did not improve the performance of the multivariate model.
- MAE increased to $48 from the multivariate model’s $31.
- MAPE increased to 1.6% from the multivariate model’s 1%.
# Merge actual and predicted values
performance_holiday = pd.merge(test, forecast_holiday[['ds', 'yhat', 'yhat_lower', 'yhat_upper']][-16:], on='ds')
# Check MAE value
performance_holiday_MAE = mean_absolute_error(performance_holiday['y'], performance_holiday['yhat'])
print(f'The MAE for the holiday/event model is {performance_holiday_MAE}')
# Check MAPE value
performance_holiday_MAPE = mean_absolute_percentage_error(performance_holiday['y'], performance_holiday['yhat'])
print(f'The MAPE for the holiday/event model is {performance_holiday_MAPE}')
Output
The MAE for the holiday/event model is 47.51193003230161
The MAPE for the holiday/event model is 0.016239759857200307
Summary
In this tutorial, you learned:
- How to include seasonalities in time series prediction using Prophet?
- How to add standard holidays of a country?
- How to add special events such as Super Bowl?
- How to add additional predictors?
- How the time series model performance is impacted by seasonalities, holidays, special events, and additional features?
To learn about how to do time series cross-validation using prophet, please refer to my previous tutorial Time Series Forecasting Of Bitcoin Prices Using Prophet.
For more information about data science and machine learning, please check out my YouTube channel and Medium Page.