A Complete Introduction To Time Series Analysis (with R):: ARIMA models

In the last section, we discussed model selection for ARMA(p,q) models by using the AIC, AICc, BIC, which are metric functions based on the likelihood and the parameters, providing a certain measure that can be used to compare models against each other on the same data. In this article, we will now recatch the ideas of differencing and seasonality that we previously studied, and see how these can be integrated into the ARMA model. Let’s start by reviewing some essential concepts from the Differencing section
Differencing






If you need a refresher, you can check this article, in which I discuss all of these in detail. The whole idea of having these operators is that we could essentially simplify some time series by eliminating some systematic trend component (and even some seasonality). How can we formalize this for ARMA(p,q) models?
Autoregressive Integrated Moving Average: ARIMA(p,d,q)




This formalizes the methods of differencing we saw previously under the Classical Decomposition model. In particular, we use the d-difference operator to eliminate trends (and in consequence some of the variances as we previously saw). This implies that the ARIMA(p,d,q) model can be used even for processes with a trend, although it is usually a good idea to remove it anyway!
Trivial cases of ARIMA(p,d,q)
As you may guess, there are some equalities we can derive from the ARIMA(p,d,q) model:

Example: ARIMA(1,1,0)
Let’s now make a concrete example: Let {X_t}~ARIMA(1,1,0). Then, this process has the form

Now, what would happen in the case the phi coefficient is equal to zero, and in the case it is not?


which is a Random Walk! , clearly not stationary. However, notice that

That is, by differencing, we achieve random noise , which is actually a stationary process.


also, we have that

which follows as the process is causal. (See this article). Therefore, we can rewrite it as

Once again, clearly X_{t} is not a stationary process as it is a random walk of AR(1) processes, however, we see that Y_{t} is!
Stationarity of ARIMA(p,d,q) models

Proof idea
We illustrate for ARIMA(1,1,1) process, but the argument obviously generalizes for ARIMA(p,d,q). We can analyse the underlying Y_{j}’s if we take the difference:

Here, let’s assume the AR(p) and MA(q) polynomials to have roots within the unit circle (see this article). However, the polynomial

has d roots on the unit circle, so X_{t} is clearly not stationary.
Model Selection for ARIMA(p,d,q) models
Two approaches:
- Adjust the AIC/AICc/BIC to take into account the extra parameter.
- Test for unit roots.
The first one is identical to what we had considered in the previous article.


As you can see, this is not too different from what we had before. The model selection in this case is done the same way as before: select some criterion, try a bunch of models on the same dataset, and choose whichever model has the lowest metric. So far, this seems like a good approach. However, some statisticians argue that one cannot use likelihood-based methods, due to the differencing factor. Indeed, how can we test that our of choice of d is good, in particular? Instead, we will test for unit roots. The following two approaches are constructed based on that principle:

Intuition
Consider the (possibly) non-zero process


We can take the difference

, where

Therefore,

then X_{t} is non-stationary. The ADP test extends this idea to AR(p) polynomials.
Kwiatowski-Phillips-Schmidt-Shin (KPSS) test

This test is quite similar in nature to the previous ones, except that the null and the alternative hypotheses are reversed. In addition, the null hypothesis actually indicates that the time sereis is stationary around a deterministic trend. This trend can be increasing or decreasing, but does not affect stationarity once removed. If you are curious, the original paper can be found here.
HowToR
As usual, we start by importing some packages:














