Time Series Anomaly Detection Using Prophet in Python
How to train a time series model, make predictions, and identify outliers using a Prophet model?
This tutorial will talk about how to do time series anomaly detection using Facebook (Meta) Prophet model in Python. Anomalies are also called outliers, and we will use these two terms interchangeably in this tutorial. After the tutorial, you will learn:
- How to train a time series model using Prophet?
- How to make predictions using a Prophet model?
- How to identify outliers using a Prophet time series forecast?
Resources for this post:
- Video tutorial for this post on YouTube
- Python code is at the end of the post. Click here for the notebook.
- More video tutorials on anomaly detection and time series
- More blog posts on anomaly detection and time series
Let’s get started!
Step 0: Algorithm for Time Series Anomaly Detection
In step 0, let’s talk about the algorithm for time series anomaly detection. At a high level, the outliers are detected based on the prediction interval of the time series. The implementation includes the following steps:
- Build a time series forecasting model.
- Make predictions on historical data using the time series forecasting model.
- Compare the actual values with the prediction intervals. Outliers are defined as data points with actual values outside of the prediction intervals.
Step 1: Install and Import Libraries
In the first step, we will install and import libraries.
yfinance
is the python package for pulling stock data from Yahoo Finance. prophet
is the package for the time series model. After installing yfinance
and prophet
, they are imported into the notebook.
We also import pandas
and numpy
for data processing, seaborn
and matplotlib
for visualization, and mean_absolute_error
and mean_absolute_percentage_error
for the model performance evaluation.
# Install libraries
!pip install yfinance prophet
# Get time series data
import yfinance as yf
# Prophet model for time series forecast
from prophet import Prophet
# Data processing
import numpy as np
import pandas as pd
# Visualization
import seaborn as sns
import matplotlib.pyplot as plt
# Model performance evaluation
from sklearn.metrics import mean_absolute_error, mean_absolute_percentage_error
Step 2: Pull Data
The second step pulls stock data from Yahoo Finance API. Two years of daily data from the beginning of 2020 to the end of 2021 are pulled for this analysis.
start_date = '2020-01-02'
because January 1st is a holiday, and there is no stock data on holidays and weekends.end_date = '2022-01-01'
becauseyfinance
excludes the end date, so we need to add one day to the last day of the data end date.
# Data start date
start_date = '2020-01-02'
# Data end date. yfinance excludes the end date, so we need to add one day to the last day of data
end_date = '2022-01-01'
The goal of the time series model is to predict the closing price of Google’s stock, so Google’s ticker GOOG
is used for pulling the data.
Prophet requires at least two columns as inputs: a ds
column and a y
column.
- The
ds
column has the time information. Currently we have the date as the index, so we reset the index and renamedate
tods
. - The y column has the time series values. In this example, because we are predicting Google’s closing price, the column name for the price is changed to
y
.
# Pull close data from Yahoo Finance for the list of tickers
ticker_list = ['GOOG']
data = yf.download(ticker_list, start=start_date, end=end_date)[['Close']]
# Change column names
data = data.reset_index()
data.columns = ['ds', 'y']
# Remove timezone from timestamp - this code was added after yfiance package update on date format
data['ds'] = data['ds'].dt.tz_convert(None)
# Take a look at the data
data.head()

Using .info
, we can see that the dataset has 505 records and there are no missing values.
# Information on the dataframe
data.info()
Output
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 505 entries, 0 to 504
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ds 505 non-null datetime64[ns]
1 y 505 non-null float64
dtypes: datetime64[ns](1), float64(1)
memory usage: 8.0 KB
Next, let’s visualize the closing prices of the two tickers using seaborn
, and add the legend to the plot using matplotlib
. We can see that the price for Google increased a lot starting in late 2020, and almost doubled in late 2021.
# Visualize data using seaborn
sns.set(rc={'figure.figsize':(12,8)})
sns.lineplot(x=data['ds'], y=data['y'])
plt.legend(['Google'])

Step 3: Build Time Series Model Using Prophet in Python
In step 3, we will build a time series model using Prophet in Python.
Notice that we did not do train test split for the modeling dataset. This is because the goal of the model is not to predict future stock prices, instead, the goal is to fit a model that predicts well on the past prices. Therefore, we will use the whole dataset for both training and forecasting.
- When initiating the prophet model, the
yearly_seasonality
andweekly_seasonality
are explicitly set to True, and then fit on the training data. - The
interval_width
is set to 0.99, which means that the uncertainty interval is 99%.
We keep the model simple in this example to focus on the process of anomaly detection. If you are interested in building a sophisticated model, please refer to my previous tutorial Multivariate Time Series Forecasting with Seasonality and Holiday Effect Using Prophet in Python
# Add seasonality
model = Prophet(interval_width=0.99, yearly_seasonality=True, weekly_seasonality=True)
# Fit the model on the training dataset
model.fit(data)
Step 4: Make Predictions Using Prophet in Python
After building the model, in step 4, we use the model to make predictions on the dataset. The forecast plot shows that the predictions are in general aligned with the actual values.
# Make prediction
forecast = model.predict(data)
# Visualize the forecast
model.plot(forecast); # Add semi-colon to remove the duplicated chart

We can also check the components plot for the trend, weekly seasonality, and yearly seasonality.
# Visualize the forecast components
model.plot_components(forecast);

Step 5: Check Time Series Model Performance
In step 5, we will check the time series model performance. The forecast dataframe does not include the actual values, so we need to merge the forecast dataframe with the actual dataframe to compare the actual values with the predicted values. Two performance metrics are included:
- MAE (Mean Absolute Error) sums up the absolute difference between actual and prediction and is divided by the number of predictions.
- MAPE (Mean Absolute Percentage Error) sums up the absolute percentage difference between actual and prediction and is divided by the number of predictions. MAPE is independent of the magnitude of data, so it can be used to compare different forecasts. But it’s undefined when the actual value is zero.
For more time series performance evaluation metrics such as MSE (Mean Squared Error), RMSE (Root Mean Square Error), MDAPE (Median Absolute Percentage Error), and SMAPE (Symmetric Mean Absolute Percentage Error), please refer to my previous tutorial Time Series Forecasting Of Bitcoin Prices Using Prophet
# Merge actual and predicted values
performance = pd.merge(data, forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']], on='ds')
# Check MAE value
performance_MAE = mean_absolute_error(performance['y'], performance['yhat'])
print(f'The MAE for the model is {performance_MAE}')
# Check MAPE value
performance_MAPE = mean_absolute_percentage_error(performance['y'], performance['yhat'])
print(f'The MAPE for the model is {performance_MAPE}')
The mean absolute error (MAE) for the model is $31, meaning that on average, the forecast is off by $31. Given that Google’s price is in thousands, the prediction is not bad.
The mean absolute percent error (MAPE) for the baseline model is 1.7%, meaning that on average, the forecast is off by 1.7% of the stock price.
The MAE for the model is 31.490791238759932
The MAPE for the model is 0.01699185966792339
Step 6: Identify Anomalies
In step 6, we will identify the time series anomalies by checking if the actual value is outside of the uncertainty interval. If the actual value is smaller than the lower bound or larger than the upper bound of the uncertainty interval, the anomaly indicator is set to 1, otherwise, it’s set to 0.
Using value_counts()
, we can see that there are 6 outliers out of 505 data points.
# Create an anomaly indicator
performance['anomaly'] = performance.apply(lambda rows: 1 if ((rows.y<rows.yhat_lower)|(rows.y>rows.yhat_upper)) else 0, axis = 1)
# Check the number of anomalies
performance['anomaly'].value_counts()
Output
0 499
1 6
Name: anomaly, dtype: int64
After printing out the anomalies, we can see that all the outliers are lower than the lower bound of the uncertainty interval.
# Take a look at the anomalies
anomalies = performance[performance['anomaly']==1].sort_values(by='ds')
anomalies

In the visualization, all the dots are actual values and the black line represents the predicted values. The orange dots are the outliers.
# Visualize the anomalies
sns.scatterplot(x='ds', y='y', data=performance, hue='anomaly')
sns.lineplot(x='ds', y='yhat', data=performance, color='black')

Summary
In this tutorial, we discussed how to make time series anomaly detection using Prophet in Python. You learned:
- How to train a time series model using Prophet?
- How to make predictions using a Prophet model?
- How to identify outliers using a Prophet time series forecast?
More tutorials are available on GrabNGoInfo YouTube Channel and GrabNGoInfo.com.