ARIMA for dummies

While at work, developing reinforcement learning model I’ve came across an Auto regressive model that is used to update policy in RL agent. This activated very deep and un-visited part of my brain which is “already learned” part. I’ve remembered that I’ve written a blog on using ARIMA which is combination of AutoRegressive model with Moving Average model. I thought it would be good idea to recap my understanding and also bring out my blog into the light. So here it goes.
Before going in to ARIMA we must recap on what “Time Series” is.
Time Series
Data points that are observed at specified times usually at equal intervals are referred to as time series data. Time series is very important in real life since most data are measured in time consecutive manner. Ex: Stock prices being recorded every second.
Time series analysis are used to predict the future. For example using past 12 months sales data to predict next n month sales therefore we could act accordingly.
Four components that explains time series data:
- Trend : Upward, downward, or stationary. If your company sales increase every year it is showing an upward trend.
- Seaonality: Repeating pattern in certain period. Ex: difference between summer and winter. Also includes special holidays
- Irregularity: External factors that affect time series data such as Covid, natural disasters.
- Cyclic: repeating up and down time series data.
ARIMA
Auto Regressive Integrated Moving Average a.k.a Box-Jenkins method.
- It is class of models that forecasts using own past values: lag values and lagged forecast errors.
- AR model uses lag values to forecast
- MA model uses lagged forecast errors to forecast
- Two models Integrated becomes ARIMA (“I” stands for Integrated)
- Consists of three parameters: p, q, d
ARIMA a naive model, it assumes time series data we are working with satisfies following conditions:
- “non-seasonal” meaning different seasons do not affect its values. When there exists seasonality we use SARIMA short for Seasonal ARIMA model
- No Irregularity. Ex: No irregular events like Covid that affect our data
Now we know what ARIMA model is and what it expects lets talk about what parameters it has in more detail
Parameters
p — order of AR term
- Number of lags of Y to be used as predictors. In other words, If you are trying to predict June’s sale how many previous(lag) month’s data are you going to use?
q — order of MA term
- Number of lagged forecast errors -> how many past forecast errors will you use?
d — Minimum differncing period
- Minimum number of differencing needed to make time series data stationary.
- Already stationary data would have d = 0.
While reading about explanation of each parameters term Stationary was not clear on my mind therefore after some research I’ve gained knowledge to answer my question:
What does stationary actually mean?
Time series data considered stationary if it contains:
- constant mean
- constant variance
- Covariance that is independent of time
In most cases time series data increase as time progresses therefore if you take consecutive segments it will not have constant mean. Below graph is Nvidia stock prices which is an example of non-stationary data. Segment into n periods and take means, they won’t be the same.

It is important to check whether our data is stationary because time series data need to be stationary before it can be modelled to forecast the future. Often times it is non-stationary therefore we difference it, subtract previous value from current value.
Since it is important to have stationary time series data, we need a way to test it. Common methods of testing whether time series data is stationary are:
- Augmented Dickey Fuller(ADF) Test
- Phillips-Perron(PP) Test
- Kwiatkowski-Phillips-Schmidt-Shin(KPSS) Test
- Graphing rolling statistics such as mean, standard deviation
Model building in python
We will be using python 3.8 to build ARIMA model and predict Nvidia’s closing stock prices.









