avatarHair Parra

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

7750

Abstract

igcaption></figcaption></figure><p id="dccf">Now we can go ahead and generate the actual data.</p> <figure id="921c"> <div> <div>

            <iframe class="gist-iframe" src="/gist/JairParra/8ef19267930351d3ed00c049f2719155.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><figure id="9a49"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*8On0L8mRW5TfX6mCd2QJ5Q.png"><figcaption></figcaption></figure>
    <figure id="080b">
        <div>
          <div>
            
            <iframe class="gist-iframe" src="/gist/JairParra/6d298f5c9ecaf7b981d542bab504b09d.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><figure id="34c5"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*hFlpMBUr7FayweQHJe-V4g.png"><figcaption></figcaption></figure>
    <figure id="2b43">
        <div>
          <div>
            
            <iframe class="gist-iframe" src="/gist/JairParra/04ceb856292e67c591661e58fc4b4b9b.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><figure id="919a"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*JvqQ7GfrMO-V42XVbLnCXw.png"><figcaption></figcaption></figure><p id="9e57">Note how here, the ACF shows three points that fall out of bounds, exactly the MA(3) specification. The PACF shows more than 3 points out of bounds, but notice how only three of them are quite far from the bounds themselves, as we get this oscillating but decreasing behavior. Let’s now use the Innovations Algorithm <code>itsmr::ia</code>to estimate the coefficients, and then another function to pack them together to visualize the progression of the estimation along with confidence bounds for every point. The code for this is quite complex, so it’s not essential that you understand what’s going on, but I left some comments regardless.</p>
    <figure id="0052">
        <div>
          <div>
            
            <iframe class="gist-iframe" src="/gist/JairParra/231f8cc3eec0a2ec6ddfe0ac907b2138.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><p id="64c7">Now the visualization:</p>
    <figure id="5de2">
        <div>
          <div>
            
            <iframe class="gist-iframe" src="/gist/JairParra/af35c3c13d721cfb9bb04479392d6c9a.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><figure id="debd"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*a_KicH8Wp3huGv5J3xC2DA.png"><figcaption></figcaption></figure><p id="341e">Now, remember that the true coefficients we specified c(0.6, 0, 0.3, 0), yet here we even seem to have an extra coefficient and the estimated second coefficient is also quite far from zero! As we mentioned before, the problem with using the Innovations algorithm alone is that we need a lot of data and iterations for it to yield good estimates. You can try and repeat this experiment, this time with <i>m=1000 </i>instead of 500, and see how this helps to improve the estimations. Let’s fit an MA(4) model (with Gaussian MLE, which we will see in the next article) to verify the residuals.</p>
    <figure id="7992">
        <div>
          <div>
            
            <iframe class="gist-iframe" src="/gist/JairParra/35f5c98c1640bd7f78773373952b3239.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><p id="487f">We see that this time the coefficients are much closer to the actual ones: that is c(0.6, 0, 0.3, 0). Inspecting the roots, we see that although we do have the extra coefficient one, this one actually lies very close to the boundary.</p>
    <figure id="37d4">
        <div>
          <div>
            
            <iframe class="gist-iframe" src="/gist/JairParra/a1477df8a24ed56ac6b84ce4dcb988ea.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><figure id="6921"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*whDyceCKo3wegM9is_EioA.png"><figcaption></figcaption></figure><p id="eacb">Checking the residuals:</p>
    <figure id="48e8">
        <div>
          <div>
            
            <iframe class="gist-iframe" src="/gist/JairParra/2c83fd53f644cd4be100bd55b40e88d4.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><figure id="babf"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*wYCqopyqDvjI0LzO7T2KhQ.png"><figcaption></figcaption></figure><p id="c73f">Which indeed are proof of a stationary time series.</p><p id="4a28"><b>Estimating ARMA(p,q) with Hannan-Rissennan</b></p><p id="a857">Let’s generate now some data for an ARMA(2,1) model:</p>
    <figure id="12e3">
        <div>
          <div>
            
            <iframe class="gist-iframe" src="/gist/JairParra/6d333ac1787a974c9d7f35f100268d61.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><p id="9052">As usual, let’s visualize our data:</p>
    <figure id="77e8">
        <div>
          <div>
            
            <iframe class="gist-iframe" src="/gist/JairParra/1924366abdccdf9322a629fa237e045e.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><figure id="18e1"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*n83H04u83kXYdxfhwXyT5Q.png"><figcaption></figcaption></figure><figure id="6f7c"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*KiL0XhzqYJeIs8MG-FeeDg.png"><figcaption></figcaption></figure><p id="5804">Once again, you see that it’s actually pretty hard to figure out what to fit the method. In order to apply this algorithm, however, we have to make sure our series is a zero-mean process. We this as follows:</p>
    <figure id="b343">
        <div>
          <div>
            
            <iframe class="gist-iframe" src="/gist/JairParra/71886173b69f1d9b09f76c02bf92dd21.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><p id="ec05">Recall true values were AR(0.6,0.2) and MA(0.4), these are not very close! Let’s compare this to the MLE estimates:</p>
    <figure id="e3b2">
        <div>
          <div>
            
            <iframe class="gist-iframe" src="/gist/JairParra/63cb018fb67bce0c47375281fe9f9564.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><p id="5528">Notice that these are much better estimates. We can see that the estimated roots also provide proof of stationarity, causality,

Options

and invertibility:</p> <figure id="d7f9"> <div> <div>

            <iframe class="gist-iframe" src="/gist/JairParra/116b9b646ce90dff1d92d14973ad2bc9.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><figure id="851f"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*2jjfHqo6YvVB8BZ2Yx1P-w.png"><figcaption></figcaption></figure><p id="fda7">Finally, let’s inspect the residuals:</p>
    <figure id="e877">
        <div>
          <div>
            
            <iframe class="gist-iframe" src="/gist/JairParra/a8b4ac938e0589a04629848e74402092.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><figure id="a1dc"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*gp0tDiDj_sBk0WhdCVBQbQ.png"><figcaption></figcaption></figure><p id="7f1f">These all seem to indicate stationarity, as well as normality of the residuals, and an apparent White Noise process. To compare both algorithms, we will first create a <code>tibble</code> object:</p>
    <figure id="75da">
        <div>
          <div>
            
            <iframe class="gist-iframe" src="/gist/JairParra/9ca375c51c1f1cdc1d244e47342eb46c.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><figure id="9fae"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*o6svjlj7ASlJ92AxEKJJMw.png"><figcaption></figcaption></figure><p id="ce03">Now we compare with larger values of n with a modified version of the Hannan-Rissennan algorithm, which does not compute sigma square. Don’t worry about understanding the code itself.</p>
    <figure id="fa64">
        <div>
          <div>
            
            <iframe class="gist-iframe" src="/gist/JairParra/9d3947e4745541a404d579e0e3816965.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><p id="32cd">Using this fact function, we can go and refit the data again:</p>
    <figure id="5930">
        <div>
          <div>
            
            <iframe class="gist-iframe" src="/gist/JairParra/fbb6977e95850aac9e5d85c8516514a9.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><p id="2adf">Next, let’s verify the roots:</p>
    <figure id="3a95">
        <div>
          <div>
            
            <iframe class="gist-iframe" src="/gist/JairParra/7cb0334ae6feae0c525f4e595252ec89.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><figure id="32ab"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*S9Va1svSS-qss94nCjuMPw.png"><figcaption></figcaption></figure><p id="c3f6">And the residuals:</p>
    <figure id="ac39">
        <div>
          <div>
            
            <iframe class="gist-iframe" src="/gist/JairParra/a0050befcc1103cfede55b000e3685c3.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><figure id="8859"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*TX3dHDcO4EdF5BBSPaQxuw.png"><figcaption></figcaption></figure><p id="684c">Once again, let’s compare the coefficients:</p>
    <figure id="8cd8">
        <div>
          <div>
            
            <iframe class="gist-iframe" src="/gist/JairParra/994c501ff0058b4164a5bb7fc48671f9.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><figure id="3cf4"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*USKZT0rqS2el74boBAkSkA.png"><figcaption></figcaption></figure><p id="c2b7">We see that now, after running Hannan Risannen with a larger sample size, the estimated coefficient are a bit better than before, but they still do not beat the MLE. This doesn’t mean that Hannan-Rissanen is completely useless, in fact, we can use it to initialize the MLE coefficients, which will ensure faster convergence.</p><h2 id="1c92">Last time</h2><p id="a4a0"><a href="https://readmedium.com/a-complete-introduction-to-time-series-analysis-with-r-estimation-of-arma-p-q-coefficients-5f8261fd9e27">Estimation of ARMA(p,q) Coefficients (Part I)</a></p><div id="d74e" class="link-block">
      <a href="https://readmedium.com/a-complete-introduction-to-time-series-analysis-with-r-estimation-of-arma-p-q-coefficients-5f8261fd9e27">
        <div>
          <div>
            <h2>A Complete Introduction To Time Series Analysis (with R):: Estimation of ARMA(p,q) Coefficients…</h2>
            <div><h3>In the last article, we discussed the extension of the Innovations algorithm for the more general ARMA(p,q) process…</h3></div>
            <div><p>medium.com</p></div>
          </div>
          <div>
            <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*WadgbivwqqnmDVsvocoO1g.png)"></div>
          </div>
        </div>
      </a>
    </div><h2 id="4c77">Next Time</h2><p id="3ec6"><a href="https://hair-parra.medium.com/a-complete-introduction-to-time-series-analysis-with-r-gaussian-time-series-cf5e32c5f4e5">Gaussian Time Series</a></p><div id="f8a0" class="link-block">
      <a href="https://hair-parra.medium.com/a-complete-introduction-to-time-series-analysis-with-r-gaussian-time-series-cf5e32c5f4e5">
        <div>
          <div>
            <h2>A Complete Introduction To Time Series Analysis (with R):: Gaussian Time Series</h2>
            <div><h3>In the last two articles, we saw a number of methods to independently estimate AR(p) and MA(q) coefficients, namely the…</h3></div>
            <div><p>hair-parra.medium.com</p></div>
          </div>
          <div>
            <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*3ypO896iNAO_5iDXsKntcg.png)"></div>
          </div>
        </div>
      </a>
    </div><h2 id="ee0c">Main page</h2><div id="dddc" class="link-block">
      <a href="https://readmedium.com/a-complete-introduction-to-time-series-analysis-with-r-9882f2d44c9d">
        <div>
          <div>
            <h2>A Complete Introduction To Time Series Analysis (with R)</h2>
            <div><h3>During these times of the Covid19 pandemic, you have perhaps heard about the collaborative efforts to predict new…</h3></div>
            <div><p>medium.com</p></div>
          </div>
          <div>
            <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*TL2PeOANEN4zG0_OqoHptQ.jpeg)"></div>
          </div>
        </div>
      </a>
    </div><h2 id="653d">Follow me at</h2><ol><li><a href="https://www.linkedin.com/in/hair-parra-526ba19b/">https://www.linkedin.com/in/hair-parra-526ba19b/</a></li><li><a href="https://github.com/JairParra">https://github.com/JairParra</a></li><li><a href="https://medium.com/@hair.parra">https://medium.com/@hair.parra</a></li></ol></article></body>

A Complete Introduction To Time Series Analysis (with R):: Estimation of ARMA(p,q) Coefficients (Part II)

The ARMA(p,q) model implies that X_{t} can be expressed in the form above.

In the last article, we learned about two algorithms to estimate the AR(p) process coefficients: the Yale-Walker equations method, and Burg’s algorithm. In this article, we will now see a very simple way to determine the MA(q) process coefficients, and a first approach to estimate the ARMA(p,q), jointly. Let’s see how this works:

Estimation of MA(q) (Innovations)

As you may guess by the title, the way to estimate the MA(q) coefficients is… the Innovations Algorithm we saw before. Recall that the MA(q) process can be written as

The idea is as follows:

  • Recall that during the iterations of the Innovations algorithm, we obtain the Theta matrix

which in turn provides the coefficients used in the recursive Innovations formula:

. It turns out that these are actually also consistent estimators of the MA(q) process coefficients!

  • We can then apply the Innovations algorithm by substituting the sample ACVF instead of the actual ACVF, that is

, and thus obtain, at the m-th step, a set of coefficients

  • It can be shown that for sufficiently large m

That is, the estimates converge in probability to the true MA(q) parameters.

  • Further, it can be shown that for large m

Estimation of ARMA(p,q) — Hannan-Rissanen Algorithm

So far, we have come up with methods to independently estimate AR(p) or MA(q) coefficients, but not yet a way to estimate both together. The Hannan-Rissanen algorithm does precisely this. The idea is as follows:

  1. Consider only the AR(p) part of the model, that is, assume

, and estimate its coefficients using the Yule-Walker equations or a combination of Burg’s algorithm + Durbin-Levinson.

2. Use the resulting AR(p) coefficients to estimate the White Noise process

3. Now, recall that the ARMA(p,q) has the form

Using step 2), this implies that

4. Using this estimation, we can now use Multiple Linear Regression to find suitable parameters. That is, we use least squares to solve the Sum of Square Residuals (SSR) given by

, on parameters phi and theta of interest, which will be the final estimated coefficients for our ARMA(p,q) model at hand :). If you are a bit rushy on Linear Regression, check this article for a refresher!

Moment Estimators

We have now seen a variety of methods to estimate AR(p), MA(q) or ARMA(p,q) coefficients by using either the ACF or PACF:

  1. Yule-Walker for AR(p) (ACF)
  2. Burg’s algorithm + Durbin-Levinson for the AR(p) (PACF)
  3. Innovations algorithm for MA(q )(ACF)
  4. Hannan-Rissanen for ARMA(p,q) (ACF/PACF + MLR)

Problem: Although all of these algorithms are quite elaborated and produce theoretically consistent parameter estimators, in practice, we have trouble getting a small asymptotic variance. In other words, these tend not to perform so well in all datasets! especially when the true model is misspecified. In such cases, non-linearity can produce rather bizarre results and overall poor fit. Often, we also require a large n (number of observations) to obtain useful results, but due to the nature of Time Series data, sometimes data might be limited. Therefore, we would like to add another assumption: specifying underlying White Noise as Gaussian. We will explore this in the next article :).

How To R

As usual, we first start by importing a couple of useful libraries.

Estimating MA(q) coefficients

This time, we will use the Innovations Algorithm to estimate MA(q) coefficients. Let’s start by simulating some data, and checking its roots to verify invertibility:

Now we can go ahead and generate the actual data.

Note how here, the ACF shows three points that fall out of bounds, exactly the MA(3) specification. The PACF shows more than 3 points out of bounds, but notice how only three of them are quite far from the bounds themselves, as we get this oscillating but decreasing behavior. Let’s now use the Innovations Algorithm itsmr::iato estimate the coefficients, and then another function to pack them together to visualize the progression of the estimation along with confidence bounds for every point. The code for this is quite complex, so it’s not essential that you understand what’s going on, but I left some comments regardless.

Now the visualization:

Now, remember that the true coefficients we specified c(0.6, 0, 0.3, 0), yet here we even seem to have an extra coefficient and the estimated second coefficient is also quite far from zero! As we mentioned before, the problem with using the Innovations algorithm alone is that we need a lot of data and iterations for it to yield good estimates. You can try and repeat this experiment, this time with m=1000 instead of 500, and see how this helps to improve the estimations. Let’s fit an MA(4) model (with Gaussian MLE, which we will see in the next article) to verify the residuals.

We see that this time the coefficients are much closer to the actual ones: that is c(0.6, 0, 0.3, 0). Inspecting the roots, we see that although we do have the extra coefficient one, this one actually lies very close to the boundary.

Checking the residuals:

Which indeed are proof of a stationary time series.

Estimating ARMA(p,q) with Hannan-Rissennan

Let’s generate now some data for an ARMA(2,1) model:

As usual, let’s visualize our data:

Once again, you see that it’s actually pretty hard to figure out what to fit the method. In order to apply this algorithm, however, we have to make sure our series is a zero-mean process. We this as follows:

Recall true values were AR(0.6,0.2) and MA(0.4), these are not very close! Let’s compare this to the MLE estimates:

Notice that these are much better estimates. We can see that the estimated roots also provide proof of stationarity, causality, and invertibility:

Finally, let’s inspect the residuals:

These all seem to indicate stationarity, as well as normality of the residuals, and an apparent White Noise process. To compare both algorithms, we will first create a tibble object:

Now we compare with larger values of n with a modified version of the Hannan-Rissennan algorithm, which does not compute sigma square. Don’t worry about understanding the code itself.

Using this fact function, we can go and refit the data again:

Next, let’s verify the roots:

And the residuals:

Once again, let’s compare the coefficients:

We see that now, after running Hannan Risannen with a larger sample size, the estimated coefficient are a bit better than before, but they still do not beat the MLE. This doesn’t mean that Hannan-Rissanen is completely useless, in fact, we can use it to initialize the MLE coefficients, which will ensure faster convergence.

Last time

Estimation of ARMA(p,q) Coefficients (Part I)

Next Time

Gaussian Time Series

Main page

Follow me at

  1. https://www.linkedin.com/in/hair-parra-526ba19b/
  2. https://github.com/JairParra
  3. https://medium.com/@hair.parra
Statistics
Machine Learning
Mathematics
Forecasting
Timeseries
Recommended from ReadMedium