What is SARIMA in Time Series Forecasting

A deep dive into the SARIMA model and its applications in time series analysis

Background

In one of my previous posts we covered probably the most famous skforecasting model, Autoregressive Integrated Moving Average better known as ARIMA. However, one disadvantage of this model is that it is lacking awareness of any seasonality. This is where the Seasonal Autoregressive Integrated Moving Average, or SARIMA, model comes in. In this post, we will take a deep dive into the theory and main ideas behind the SARIMA model and how to implement it in Python.

I highly recommend reading my previous article if you are not too familiar with ARIMA, as in this article we will be drawing quite a lot of assumed prior knowledge of how the original ARIMA model works!

What Is SARIMA?

Overview

SARIMA is an extension of the regular ARIMA model that adds a seasonality component to the model. This allows us to better capture seasonal affects that the regular ARIMA model does not permit.

If you want to learn more about seasonality in time series, I highly recommend you read one of my previous posts:

Seasonality of Time Series

An intuition of how seasonality affects time series analysis

towardsdatascience.com

Theory

The classic ARIMA model has three components: Autoregressive, Integrated (differencing), and Moving-Average. These are then linearly combined to form the model:

Where:

y’: differenced time series, the number of differencing applied is noted as d
ϕ: coefficients of the autoregressive components (lags)
p: number of autoregressive components
ε: forecast error terms, the moving-average components
θ: coefficients of the lagged forecast errors
q: number of lagged error components

The model is often compactly written ARIMA(p, d, q) where p, d, and q refer to the order of autoregressors, differencing and moving-average components respectively.

SARIMA adds a seasonality component to each factor of the ARIMA equation to produce SARIMA(p, d, q)(P, D, Q)m:

Where:

y’: differenced time series, through both regular, d, and seasonal, D, differencing
P: number of seasonal auto-regressors
ω: coefficients of the seasonal autoregressive components
Q: number of seasonal moving-average components
η: coefficients of the seasonal forecast errors
m: length of season

How To Forecast Time-Series Using Autoregression

Tutorial on how to forecast using an autoregressive model in Python

towardsdatascience.com

How To Forecast With Moving Average Models

Tutorial and theory on how to carry out forecasts with moving average models

towardsdatascience.com

Requirements

Like the original ARIMA model, the SARIMA model needs to have stationary data to model and forecast the time series. A stationarity time series does not exhibit any long-term trend or clear seasonality, its statistical properties, such as mean and variance, remain constant over time.

To produce a stationary time series we need to stabilize the mean and variance. The mean can be stabilized through differencing and the number of differencing applied is d or D in the case of seasonal differencing. The variance can be stabilized through transformations such as the logarithmic and Box-Cox transform, this makes the seasonal fluctuations occur on a similar level every season.

If you want to learn more about stationarity, check out my previous blog posts about it here:

Time-Series Stationarity Simply Explained

A simple and intuitive explanation for the need of stationarity in time-series modelling.

towardsdatascience.com

Box-Cox Transform for Time Series

How to create a stationary time series using the Box-Cox transformation.

towardsdatascience.com

Order Selection

After the time series is stationary, we then need to deduce the best orders, (p, d, q) and (P, D, Q)m, for our model. The simplest one to calculate is the seasonal, D, and regular differencing, d. This can be deduced through the Augmented Dickey-Fuller (ADF) statistical test that deduces whether a time series is stationary or not.

The autoregressive and moving-average (forecast errors) orders (p, q, P, Q) can be computed by analyzing the partial autocorrelation function (PACF) and autocorrelation function respectively. The idea behind this technique is to plot a correlogram of the autoregressors and moving-average value and deduce which ones are statistically significant. The significant ones indicate that they have a substantial impact on the forecast.

These correlograms will also allow us to observe the seasonal pattern if any, as we may see peaks at certain multiple lags. For example, a SARIMA(0,0,0)(1,0,0)4 will show exponential decay in the lags for the ACF but a significant spike at lag 4 in the PACF. If the data is indexed by month, then this is would be an example of quarterly seasonality.

If this seems confusing at the moment, don’t worry. In the Python implementation later we will walk through this process!

Autocorrelation For Time Series Analysis

Describing what autocorrelation is and why it is useful in time series analysis.

towardsdatascience.com

Partial Autocorrelation for Time Series Analysis

Describing what partial autocorrelation is and its importance in time series analysis

towardsdatascience.com

Estimation

The final step is to compute the corresponding coefficients for these orders. The most common method is to use Maximum Likelihood Estimation (MLE) which estimates the coefficients against some assumed probability distribution, typically normal, to calculate which coefficient is the most likely to generate that data. As the time series is stationary and has constant statistical properties, we can say that it belongs to some probability distribution allowing us to use MLE. This is why stationarity is the key requirement for SARIMA.

Josh Starmer’s StatQuest does a great explanation of MLE. Link here.

Python Tutorial

Data

Let’s begin by plotting the time series we want to forecast:

Data from Kaggle with a CC0 licence.

There is an obvious trend and seasonality, so the data is not stationary as the mean and variance is changing over time. Therefore, we need to apply differencing and the Box-Cox transform to make our series stationary as required by SARIMA:

The data now looks sufficiently stationary.

Modelling

We will now use the ACF and PACF correlograms to deduce the orders for the autoregressive and moving-average components:

The blue region signifies where the lags are no longer statistically significant.

We already observed that our series yearly seasonality, m=12, but the above plots confirm this as we have large spikes at the 12th lags. The lags are also significant to around ~10th lag for both plots. Overall this indicates that a SARIMA(10, 1, 10)(1, 1, 1)12 model should be suitable.

Now, let’s fit the model using the ARIMA class from statsmodels and generate the forecasts. Luckily, this class carries out differencing for us, so we only need to pass the Box-Cox transformed time series:

Analysis

Finally, we will plot the forecasts:

The SARIMA forecasts seemed to have done quite well!

Summary and Further Thoughts

In this article, we have discussed an extension to the famous ARIMA forecasting model, SARIMA. This model adds seasonality components to the regular ARIMA model to enable the modeling of more complex time series. The SARIMA model is simple to apply in Python through the statsmodels package.

The full code used in this article can be found on my GitHub here:

Medium-Articles/sarima.py at main · egorhowell/Medium-Articles

You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…

github.com

References and Further Reading

Forecasting: Principles and Practice: https://otexts.com/fpp2/

Another Thing!

I have a free newsletter, Dishing the Data, where I share weekly tips for becoming a better Data Scientist. There is no “fluff” or “clickbait,” just pure actionable insights from a practicing Data Scientist.

Dishing The Data | Egor Howell | Substack

How To Become A Better Data Scientist. Click to read Dishing The Data, by Egor Howell, a Substack publication with…

newsletter.egorhowell.com

What is SARIMA in Time Series Forecasting

A deep dive into the SARIMA model and its applications in time series analysis

Background

What Is SARIMA?

Overview

Seasonality of Time Series

An intuition of how seasonality affects time series analysis

Theory

How To Forecast Time-Series Using Autoregression

Tutorial on how to forecast using an autoregressive model in Python

How To Forecast With Moving Average Models

Tutorial and theory on how to carry out forecasts with moving average models

Requirements

Time-Series Stationarity Simply Explained

A simple and intuitive explanation for the need of stationarity in time-series modelling.

Box-Cox Transform for Time Series

How to create a stationary time series using the Box-Cox transformation.

Order Selection

Autocorrelation For Time Series Analysis

Describing what autocorrelation is and why it is useful in time series analysis.

Partial Autocorrelation for Time Series Analysis

Describing what partial autocorrelation is and its importance in time series analysis

Estimation

Python Tutorial

Data

Modelling

Analysis

Summary and Further Thoughts

Medium-Articles/sarima.py at main · egorhowell/Medium-Articles

You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…

References and Further Reading

Another Thing!

Dishing The Data | Egor Howell | Substack

How To Become A Better Data Scientist. Click to read Dishing The Data, by Egor Howell, a Substack publication with…

Connect With Me!