Free AI web copilot to create summaries, insights and extended knowledge, download it at here

7158

Abstract

tion for ARIMA(p,d,q) models</h2><p id="0236">Two approaches:</p><ol><li>Adjust the AIC/AICc/BIC to take into account the extra parameter.</li><li>Test for <b>unit roots</b>.</li></ol><p id="5ced">The first one is identical to what we had considered in the previous article.</p><figure id="c4b8"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*4olQhVGEpBeG3FyCLTAf0w.png"><figcaption></figcaption></figure><figure id="88ba"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*Oc9ngKFHOGOq0hObDSjmSA.png"><figcaption></figcaption></figure><p id="cd87">As you can see, this is not too different from what we had before. The model selection in this case is done the same way as before: select some criterion, try a bunch of models on the same dataset, and choose whichever model has the lowest metric. So far, this seems like a good approach. However, some statisticians argue that one cannot use likelihood-based methods, due to the differencing factor. Indeed, how can we test that our of choice of <i>d</i> is good, in particular? Instead, we will test for <b>unit roots. </b>The following two approaches are constructed based on that principle:</p><figure id="10a3"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*85vzzKNuhPaBBZN2YbEaHw.png"><figcaption></figcaption></figure><p id="e0e9"><b>Intuition</b></p><p id="50b6">Consider the (possibly) non-zero process</p><figure id="a7a0"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*8qhtO9iXoWSB0G6vRgOWRA.png"><figcaption></figcaption></figure><figure id="a4cf"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*p8EEvb3LxxUZQicOS0DUCA.png"><figcaption></figcaption></figure><p id="b216">We can take the difference</p><figure id="a3b8"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*aokXKD9H6v20PK05_zDfng.png"><figcaption></figcaption></figure><p id="e080">, where</p><figure id="20bb"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*T_-s0m5rFc0P1mMabPnWAw.png"><figcaption></figcaption></figure><p id="c35d">Therefore,</p><figure id="a1f6"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*SonfDIBej-PdH3AbkgWQvw.png"><figcaption></figcaption></figure><p id="8592">then X_{t} is non-stationary. The ADP test extends this idea to AR(p) polynomials.</p><h2 id="00ff">Kwiatowski-Phillips-Schmidt-Shin (KPSS) test</h2><figure id="6857"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*gEZfPn1ZBoT-KtuVi-8_6A.png"><figcaption></figcaption></figure><p id="ba5d">This test is quite similar in nature to the previous ones, except that the null and the alternative hypotheses are reversed. In addition, the null hypothesis actually indicates that the time sereis is stationary around a deterministic trend. This trend can be increasing or decreasing, but does not affect stationarity once removed. If you are curious, the original paper can be found <a href="https://debis.deu.edu.tr/userweb/onder.hanedar/dosyalar/kpss.pdf">here</a>.</p><h2 id="762f">HowToR</h2><p id="809f">As usual, we start by importing some packages:</p> <figure id="9a20"> <div> <div>

            <iframe class="gist-iframe" src="/gist/JairParra/869535950e003c6b4cb8aef02a29c9ce.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><p id="4e10">The data we will be using is the <a href="https://www.rdocumentation.org/packages/datasets/versions/3.6.2/topics/WWWusage"><code>WWWus</code>age data</a>, available in <code>R</code> datasets (you don’t have to download it). This data itself is a metric for the extend to which people where using the internet in a period of time. First, let’s get a quick summary of the data:</p>
    <figure id="da8d">
        <div>
          <div>
            
            <iframe class="gist-iframe" src="/gist/JairParra/cdc07e2664104065cf5ef3b29f6d499b.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><p id="18da">We see that most values are between 99 and 168. Next, we can plot the data itself, along with its ACF and PACF:</p>
    <figure id="3535">
        <div>
          <div>
            
            <iframe class="gist-iframe" src="/gist/JairParra/ea1c8a1ede6dc4f8421d65b07679b761.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><figure id="d2bc"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*MRpBG1dkRsvzxYw4R2u2hQ.png"><figcaption></figcaption></figure><figure id="04a0"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*NIJWjqM-62UnoZUmN7Elfg.png"><figcaption></figcaption></figure><figure id="25d0"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*_Rsp8HGxf5VEPvnIAKKSNg.png"><figcaption></figcaption></figure><p id="9cdc">Right off the bat, we can see a clear indications of non-stationarity in the ACF, and strong partial autocorrelation for the first two lags.</p>
    <figure id="0795">
        <div>
          <div>
            
            <iframe class="gist-iframe" src="/gist/JairParra/4d466ab742003ac3530f9326d0f4f1cc.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><p id="987c">As we fail to reject the null in the ADF test, and reject it in the KPSS test, this provides us evidence that the process is indeed not stationary. One thing we can try, is whether differencing and considering different lag orders used to calculate the statistic makes any difference.</p>
    <figure id="c3f8">
        <div>
          <div>
            
            <iframe class="gist-iframe" src="/gist/JairParra/f4653f58790abd386ea6671cc671c497.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><p id="8f47">We see that among all of these, only using <code>lag-order=1</code> we reject the null. The problem is that we are not even sure which model this would be, as stationarity is concept almost proper to ARMA(p,q) models, as we saw previously. Therefore, by using this test on with respect to some fitted model, we must first assume the model indeed holds. We should then, make use of other stationarity tests, and keep these things in mind.</p><p id="f8bd"><b>Fitting the model</b></p><p id="5a00">The next thing to do then, is go ahead and fit some models. We will use the <code>auto.arima</code> function we saw in the previous article. Note that we set the seasonal argument to <code>FALSE</code> . Can you guess what would happen if we set it?</p>
    <figure id="5a88">
        <div>
          <div>
            
            <iframe class="gist-iframe" src="

Options

/gist/JairParra/90844c28eba084e7c8a092797bffe13c.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined"> </div> </div> </figure></iframe></div></div></figure><p id="4bf3">Notice how we obtained an ARIMA(3,1,0) model. That means, that if we were to take a difference once in the model, we would obtain an AR(3) model as a result. Let’s inspect the resultant model and its corresponding roots:</p> <figure id="68f6"> <div> <div>

            <iframe class="gist-iframe" src="/gist/JairParra/19244c097ae22942ae3fd693a1ffd604.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><figure id="be19"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*nSpRgEkffMKsXN7UVlhn0g.png"><figcaption></figcaption></figure><p id="d62c">This tells us that after differencing, the model should indeed be causal and stationary, since the inverse roots all fall within the unit circle. We can verify this by applying the ADF test on the residuals as well:</p>
    <figure id="0b04">
        <div>
          <div>
            
            <iframe class="gist-iframe" src="/gist/JairParra/d1605388bfcc77d89d7855306115e0fc.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><p id="d69d">Similarly, plotting the residuals and their ACF and PACF functions take us to the same conclusion:</p>
    <figure id="9f4b">
        <div>
          <div>
            
            <iframe class="gist-iframe" src="/gist/JairParra/249e42874e0f0e401ac636f96b9988aa.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><figure id="c0e1"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*GZBivWm_c4z3s2h6kR4g4g.png"><figcaption></figcaption></figure><figure id="3e2e"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*1D2Wm9Js_egQENNBV9VW6A.png"><figcaption></figcaption></figure><figure id="7b8b"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*RZ1NWnYvG148yzRSFR5P0A.png"><figcaption></figcaption></figure><p id="c41e">Note that we could also try enforcing a limit degree onto the <code>auto.arima</code> function, so that the polynomials or differencing components do not overpass that number. For instance, we can enforce <code>d=2</code> , which will leave us with an ARIMA(2,2,0) as our best model:</p>
    <figure id="dbde">
        <div>
          <div>
            
            <iframe class="gist-iframe" src="/gist/JairParra/9a2975021e7ba6c85b6b66a18c665904.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure>
    <figure id="a9a3">
        <div>
          <div>
            
            <iframe class="gist-iframe" src="/gist/JairParra/bc5b97114e5abb24403ba8d1076907d1.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><figure id="e0d8"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*92rR4H6cs4Hhd-D_I-GJuQ.png"><figcaption></figcaption></figure><figure id="60be"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*w2DlSuPy4XcViZ4V4lge7A.png"><figcaption></figcaption></figure><figure id="ebe2"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*ScUzYqFqYAou-pXet9C-yw.png"><figcaption></figcaption></figure><p id="1a87">We see that in this case, the resulting model with double difference degree ARMA(2,2,0) is actually comparable in fit to the one we obtained before.</p><h2 id="4c77">Next Time</h2><p id="3ec6">And that’s it for now! In the next article, we will cover the so-called Seasonal ARIMA or <a href="https://hair-parra.medium.com/a-complete-introduction-to-time-series-analysis-with-r-sarima-models-ff86d526d1d7">SARIMA models</a>, another useful extension in our Time Series models arsenal.</p><div id="eb2c" class="link-block">
      <a href="https://readmedium.com/a-complete-introduction-to-time-series-analysis-with-r-sarima-models-ff86d526d1d7">
        <div>
          <div>
            <h2>A Complete Introduction To Time Series Analysis (with R):: SARIMA models</h2>
            <div><h3>In the last article, we saw one important useful extension to the ARMA models: the Autoregressive Integrated Moving…</h3></div>
            <div><p>medium.com</p></div>
          </div>
          <div>
            <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*h5GWLGAfxYx4BEi8K6cLuA.png)"></div>
          </div>
        </div>
      </a>
    </div><h2 id="2754">Last time</h2><p id="a4a0"><a href="https://hair-parra.medium.com/a-complete-introduction-to-time-series-analysis-with-r-model-selection-for-arma-p-q-ebc338e6d159">Model Selection for ARMA(p,q)</a></p><div id="6f94" class="link-block">
      <a href="https://hair-parra.medium.com/a-complete-introduction-to-time-series-analysis-with-r-model-selection-for-arma-p-q-ebc338e6d159">
        <div>
          <div>
            <h2>A Complete Introduction To Time Series Analysis (with R):: Model Selection for ARMA(p,q)</h2>
            <div><h3>In the last section, we learned about Gaussian Time Series, a powerful and flexible assumption when it comes to…</h3></div>
            <div><p>hair-parra.medium.com</p></div>
          </div>
          <div>
            <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*TCExLNOH_a2mUN4cOyguMQ.png)"></div>
          </div>
        </div>
      </a>
    </div><h2 id="ee0c">Main page</h2><div id="dddc" class="link-block">
      <a href="https://readmedium.com/a-complete-introduction-to-time-series-analysis-with-r-9882f2d44c9d">
        <div>
          <div>
            <h2>A Complete Introduction To Time Series Analysis (with R)</h2>
            <div><h3>During these times of the Covid19 pandemic, you have perhaps heard about the collaborative efforts to predict new…</h3></div>
            <div><p>medium.com</p></div>
          </div>
          <div>
            <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*TL2PeOANEN4zG0_OqoHptQ.jpeg)"></div>
          </div>
        </div>
      </a>
    </div><h2 id="653d">Follow me at</h2><ol><li><a href="https://www.linkedin.com/in/hair-parra-526ba19b/">https://www.linkedin.com/in/hair-parra-526ba19b/</a></li><li><a href="https://github.com/JairParra">https://github.com/JairParra</a></li><li><a href="https://medium.com/@hair.parra">https://medium.com/@hair.parra</a></li></ol></article></body>

A Complete Introduction To Time Series Analysis (with R):: ARIMA models

In the last section, we discussed model selection for ARMA(p,q) models by using the AIC, AICc, BIC, which are metric functions based on the likelihood and the parameters, providing a certain measure that can be used to compare models against each other on the same data. In this article, we will now recatch the ideas of differencing and seasonality that we previously studied, and see how these can be integrated into the ARMA model. Let’s start by reviewing some essential concepts from the Differencing section

Differencing

If you need a refresher, you can check this article, in which I discuss all of these in detail. The whole idea of having these operators is that we could essentially simplify some time series by eliminating some systematic trend component (and even some seasonality). How can we formalize this for ARMA(p,q) models?

Autoregressive Integrated Moving Average: ARIMA(p,d,q)

This formalizes the methods of differencing we saw previously under the Classical Decomposition model. In particular, we use the d-difference operator to eliminate trends (and in consequence some of the variances as we previously saw). This implies that the ARIMA(p,d,q) model can be used even for processes with a trend, although it is usually a good idea to remove it anyway!

Trivial cases of ARIMA(p,d,q)

As you may guess, there are some equalities we can derive from the ARIMA(p,d,q) model:

Example: ARIMA(1,1,0)

Let’s now make a concrete example: Let {X_t}~ARIMA(1,1,0). Then, this process has the form

Now, what would happen in the case the phi coefficient is equal to zero, and in the case it is not?

which is a Random Walk! , clearly not stationary. However, notice that

That is, by differencing, we achieve random noise , which is actually a stationary process.

also, we have that

which follows as the process is causal. (See this article). Therefore, we can rewrite it as

Once again, clearly X_{t} is not a stationary process as it is a random walk of AR(1) processes, however, we see that Y_{t} is!

Stationarity of ARIMA(p,d,q) models

Proof idea

We illustrate for ARIMA(1,1,1) process, but the argument obviously generalizes for ARIMA(p,d,q). We can analyse the underlying Y_{j}’s if we take the difference:

Here, let’s assume the AR(p) and MA(q) polynomials to have roots within the unit circle (see this article). However, the polynomial

has d roots on the unit circle, so X_{t} is clearly not stationary.

Model Selection for ARIMA(p,d,q) models

Two approaches:

Adjust the AIC/AICc/BIC to take into account the extra parameter.
Test for unit roots.

The first one is identical to what we had considered in the previous article.

As you can see, this is not too different from what we had before. The model selection in this case is done the same way as before: select some criterion, try a bunch of models on the same dataset, and choose whichever model has the lowest metric. So far, this seems like a good approach. However, some statisticians argue that one cannot use likelihood-based methods, due to the differencing factor. Indeed, how can we test that our of choice of d is good, in particular? Instead, we will test for unit roots. The following two approaches are constructed based on that principle:

Intuition

Consider the (possibly) non-zero process

We can take the difference

, where

Therefore,

then X_{t} is non-stationary. The ADP test extends this idea to AR(p) polynomials.

Kwiatowski-Phillips-Schmidt-Shin (KPSS) test

This test is quite similar in nature to the previous ones, except that the null and the alternative hypotheses are reversed. In addition, the null hypothesis actually indicates that the time sereis is stationary around a deterministic trend. This trend can be increasing or decreasing, but does not affect stationarity once removed. If you are curious, the original paper can be found here.

HowToR

As usual, we start by importing some packages:

The data we will be using is the WWWusage data, available in R datasets (you don’t have to download it). This data itself is a metric for the extend to which people where using the internet in a period of time. First, let’s get a quick summary of the data:

We see that most values are between 99 and 168. Next, we can plot the data itself, along with its ACF and PACF:

Right off the bat, we can see a clear indications of non-stationarity in the ACF, and strong partial autocorrelation for the first two lags.

As we fail to reject the null in the ADF test, and reject it in the KPSS test, this provides us evidence that the process is indeed not stationary. One thing we can try, is whether differencing and considering different lag orders used to calculate the statistic makes any difference.

We see that among all of these, only using lag-order=1 we reject the null. The problem is that we are not even sure which model this would be, as stationarity is concept almost proper to ARMA(p,q) models, as we saw previously. Therefore, by using this test on with respect to some fitted model, we must first assume the model indeed holds. We should then, make use of other stationarity tests, and keep these things in mind.

Fitting the model

The next thing to do then, is go ahead and fit some models. We will use the auto.arima function we saw in the previous article. Note that we set the seasonal argument to FALSE . Can you guess what would happen if we set it?

Notice how we obtained an ARIMA(3,1,0) model. That means, that if we were to take a difference once in the model, we would obtain an AR(3) model as a result. Let’s inspect the resultant model and its corresponding roots:

This tells us that after differencing, the model should indeed be causal and stationary, since the inverse roots all fall within the unit circle. We can verify this by applying the ADF test on the residuals as well:

Similarly, plotting the residuals and their ACF and PACF functions take us to the same conclusion:

Note that we could also try enforcing a limit degree onto the auto.arima function, so that the polynomials or differencing components do not overpass that number. For instance, we can enforce d=2 , which will leave us with an ARIMA(2,2,0) as our best model:

We see that in this case, the resulting model with double difference degree ARMA(2,2,0) is actually comparable in fit to the one we obtained before.

Next Time

And that’s it for now! In the next article, we will cover the so-called Seasonal ARIMA or SARIMA models, another useful extension in our Time Series models arsenal.

A Complete Introduction To Time Series Analysis (with R):: SARIMA models

In the last article, we saw one important useful extension to the ARMA models: the Autoregressive Integrated Moving…

medium.com

Last time

Model Selection for ARMA(p,q)

A Complete Introduction To Time Series Analysis (with R):: Model Selection for ARMA(p,q)

In the last section, we learned about Gaussian Time Series, a powerful and flexible assumption when it comes to…

hair-parra.medium.com

Main page

A Complete Introduction To Time Series Analysis (with R)

During these times of the Covid19 pandemic, you have perhaps heard about the collaborative efforts to predict new…

medium.com