avatarHair Parra

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

7968

Abstract

umyk7IQ.png"><figcaption></figcaption></figure><p id="ec12">given the previous <i>i</i> observations.</p><figure id="f10f"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*hGQ6DOrUn8AeLyhWIKLUeg.png"><figcaption></figcaption></figure><figure id="4485"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*Rp_0TvYlbXoRcCyeJTje9A.png"><figcaption></figcaption></figure><p id="137e">, where</p><figure id="322f"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*sLmm3cjVAB9uMNdGu7FN2Q.png"><figcaption></figcaption></figure><p id="b97b">Note that for each <i>i </i>, we are looking at the <b>innovation</b> that we get from taking the BLP using the previous <i>i</i> observations for looking at a particular time in our sequence; i.e. it works us backwards from the last time point all the way back to the earlierst point we can go by <i>i</i> observations.</p><figure id="6c11"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*2yjrPidBLlhEW6v267B-4w.png"><figcaption></figcaption></figure><figure id="4ba6"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*8qZS4uqs9eV9SbOsjToQOA.png"><figcaption></figcaption></figure><figure id="ed94"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*KNb-T2xEeh-oDSmuFG-Afw.png"><figcaption></figcaption></figure><figure id="9534"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*E_giUmXGA8q1CyvhOJFmdg.png"><figcaption></figcaption></figure><figure id="a10a"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*3olprFTpyCjC-5Fb5-4stg.png"><figcaption></figcaption></figure><p id="69ed">where</p><figure id="f434"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*A6_XRLCAxET-cQBqowqUSw.png"><figcaption></figcaption></figure><figure id="2af0"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*jpAQAKZcZZZxQAR_JSavYQ.png"><figcaption></figcaption></figure><figure id="08a3"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*RHCVeaPyuXFHij-v71-Q6w.png"><figcaption></figcaption></figure><p id="280f">Note that both <i>u </i>and <i>v </i>are functions of the data and phi parameters!</p><figure id="5552"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*PUHuLdxtzTduONNFD4V25w.png"><figcaption></figcaption></figure><figure id="53a8"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*WadgbivwqqnmDVsvocoO1g.png"><figcaption></figcaption></figure><p id="6474">In addition, we use</p><figure id="ebbe"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*MqSaL9ZuYTWvP1RSVHjZIw.png"><figcaption></figcaption></figure><p id="8b61">You can think this as estimating the coefficients optimizing (minimizing) a mix between averaged adjacent past and future observations for every time step, and considering these in turn.</p><figure id="8264"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*dscYlaq1pVjf0OMZgMRbyw.png"><figcaption></figcaption></figure><figure id="eaf3"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*XwVMiu7pAlaBSTvyKzuJPw.png"><figcaption></figcaption></figure><p id="802e">6. Although the proof is definitely out of scope, with this we have actually estimated a set of PACF coefficients</p><figure id="c6cd"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*O_ffR5OKY35-fdKHrLnKag.png"><figcaption></figcaption></figure><p id="563e">Which are valid up to step <i>p. </i>At this point, we essentially go back to the <a href="https://hair-parra.medium.com/a-complete-introduction-to-time-series-analysis-with-r-durbin-levinson-algorithm-44e34230bc60"><b>Durbin-Levinson algorithm </b></a>, that is, for <i>n </i>steps, we</p><figure id="dd6e"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*-6qWd1OZHE0BN0xKV1cUxA.png"><figcaption></figcaption></figure><p id="b76d">7. This yields in turn a set of coefficients which can be used as estimations of</p><figure id="612d"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*ZWE5VlOSP5vovI18SlVRNg.png"><figcaption></figcaption></figure><p id="04ce">That is, the last value of the phi values at each timestep, which constitute the set of estimated AR(p) coefficients.</p><p id="275d">8. For large <i>n</i> and given a “true” AR(p) model, it holds that</p><figure id="3925"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*YKo0_HwxAMC2evKlY7CCug.png"><figcaption></figcaption></figure><p id="1966">This asymptotic normality ensures “nice” properties when it comes to confidence intervals</p><p id="733c">And that’s pretty much it! Although it might be somehow confusing due to the notation and the various formulas and equations, the basic idea is that Burg’s algorithm produces a set of somehow more statistically sound estimates of the AR(p) model (assuming this one has been correctly specified), by considering not only projections and BLP that use observations from the past, but also even in the “future”, given the data at hand.</p><h2 id="00a9">How to R</h2><p id="ef80">As usual, we first start by importing a couple of useful libraries.</p> <figure id="8e49"> <div> <div>

            <iframe class="gist-iframe" src="/gist/JairParra/11eeee9263b31545d5bb8d36c8b3f6a6.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><p id="5e74"><b>Estimating AR(p) coefficients</b></p><p id="a4a9">We will first start by creating some artificial AR(4) process along with some data generated from it, and then proceed to use the algorithms we have seen to estimate its coefficients. First, let’s generate some data:</p>
    <figure id="89b8">
        <div>
          <div>
            
            <iframe class="gist-iframe" src="/gist/JairParra/75529872cf07612a5fe8bf4cde977b77.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><p id="1410">Now let’s check the roots of this process to make sure it is stationary,</p>
    <figure id="3f85">
        <div>
          <div>
            
            <iframe class="gist-iframe" src="/gist/JairParra/a24a2c9ac3c2dc01f423a5700b9596a1.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><figure id="c5d4"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*f9l6yGnb0cRLz8wG9cUHow.png"><figcaption></figcaption></figure><p id="999f">So we see that we have all four different (inverse of the) roots that are all inside the circle, so the process is <b>causal</b> and <b>stationary</b>. (Although see that one of them is close to the edge!!! That is, although the process is stationary, the hardest the estimation will be, the bigger the variance, the worse the performance). Now generate some data and compare sample ACF and PACF to truth</p>
    <figure id="708a">
        <div>
          <div>
            
            <iframe class="gist-iframe" src="/gist/JairParra/a1afd6ab8f4a89a7f5fbf038a90e0460.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><figure id="b0f0"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*iIgXeEeEVS8ndb74yOu_6g.png"><figcaption></figcaption></figure><p id="77a3">This definitely doesn’t look like white noise! Let’s check its ACF and PACF functions:</p>
    <figure id="8488">
        <div>
          <div>
            
            <iframe class="gist-iframe" src="/gist/JairParra/258db323a5f09bd347c63e49c0781790.js" allowfullscreen="" frameborder="0" height="undefined"

Options

width="undefined"> </div> </div> </figure></iframe></div></div></figure><figure id="c22f"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*ElGig6imoE4iTAOnltKDuA.png"><figcaption></figcaption></figure><figure id="0aa2"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*lr1K8J5ecyBEzanbxke3hw.png"><figcaption></figcaption></figure><p id="0067">Notice how the number of lags that fall out of bounds is precisely 4; the specified degree of the AR(4) polynomial. In the code above, we have also plotted the actual values (as points). Also, notice how close the estimated ACF and PACF values are actually are to the true ones.</p><p id="331c">Let’s now try to fit a couple of algorithms with the help of the <code>ar</code> function:</p> <figure id="4a0c"> <div> <div>

            <iframe class="gist-iframe" src="/gist/JairParra/efa9cc20f0b7a7ba7c09f7cc9bbe5ee9.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><p id="722c">In the first part of the code block, we fit three models: <b>Yule-Walker</b>, <b>Burg’s Method,</b> and the <b>Gaussian Time Series MLE. </b>Then, we construct a function to pack to repeat this process and also pack all the relevant information we need into one big <code>tibble</code> object. We can then check the tables with information about the true and fitted coefficients and standard errors as follows:</p>
    <figure id="8fa1">
        <div>
          <div>
            
            <iframe class="gist-iframe" src="/gist/JairParra/1459b45fbca15a99dcc5580adc58389d.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><figure id="653c"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*mH4sVJifTF-ilZSaN_8PSw.png"><figcaption></figcaption></figure><figure id="ad63"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*OySDeEMyXrof3voxjRx8FA.png"><figcaption></figcaption></figure><figure id="b321"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*VqydqZPCiyr3IoSdhZ7K0Q.png"><figcaption></figcaption></figure><p id="34be">Notice that even though we over-specified the model, the extra-coefficients are actually very close to 0! In general, it’s better to “over-specify” the model, but we will see later a simple, but solid method for model selection.</p><p id="053b">Let’s now go and visualize the actual coefficients and confidence intervals that we obtain from the tables for the different methods:</p>
    <figure id="4358">
        <div>
          <div>
            
            <iframe class="gist-iframe" src="/gist/JairParra/ece2859f455de9b382011d68fe2de355.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><figure id="e9f9"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*dpghb2e2wYh1rLp9noMcFw.png"><figcaption></figcaption></figure><p id="5b99">Now, we would like to obtain some diagnostics with the model; we can also use the <code>Arima</code> function to fit an AR(4) model using MLE, and obtain diagnostics.</p>
    <figure id="e435">
        <div>
          <div>
            
            <iframe class="gist-iframe" src="/gist/JairParra/ad1b38d6bdd81847f850a404983c9d2f.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><p id="4d2d">That is, we explicitly specify the order of the model, and then we can see the corresponding estimated coefficients, the sigma², log-likelihood, and other evaluation metrics. Checking the residuals yields</p>
    <figure id="572f">
        <div>
          <div>
            
            <iframe class="gist-iframe" src="/gist/JairParra/1a76f818817c14268e1e032cd79cbdaf.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><figure id="6a16"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*bRiys3hJiv4gKO8KeC-eEg.png"><figcaption></figcaption></figure><p id="5c49">From which we can see that the model indeed looks like white noise, and also approximately normally distributed. The Ljung-Box test does NOT reject the hypothesis that the residuals are white noise.</p><h2 id="635d">Next time</h2><p id="ce99">In the next article, we will continue with algorithms to estimate the MA(q) coefficients independently, then how to combine these along with the other AR(p) algorithms to estimate the ARMA(p,q) coefficients as a whole, and finally, we study Gaussian Time Series in subsequent articles, one of the most widely used assumptions to estimate ARMA(p,q) parameters. Until next time!</p><div id="09ea" class="link-block">
      <a href="https://readmedium.com/a-complete-introduction-to-time-series-analysis-with-r-estimation-of-arma-p-q-coefficients-728176749315">
        <div>
          <div>
            <h2>A Complete Introduction To Time Series Analysis (with R):: Estimation of ARMA(p,q) Coefficients…</h2>
            <div><h3>In the last article, we learned about two algorithms to estimate the AR(p) process coefficients: the Yale-Walker…</h3></div>
            <div><p>medium.com</p></div>
          </div>
          <div>
            <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*PVs6x48Z97aE2g_h7_biOQ.png)"></div>
          </div>
        </div>
      </a>
    </div><h2 id="4d23">Last time</h2><p id="ee0c"><a href="https://readmedium.com/a-complete-introduction-to-time-series-analysis-with-r-prediction-iii-forecasting-with-90867b8a53b8">Forecasting with ARMA(p,q) models</a></p><div id="ead1" class="link-block">
      <a href="https://readmedium.com/a-complete-introduction-to-time-series-analysis-with-r-prediction-iii-forecasting-with-90867b8a53b8">
        <div>
          <div>
            <h2>A Complete Introduction To Time Series Analysis (with R):: Prediction III: Forecasting with…</h2>
            <div><h3>We have come a long way from first exploring the idea of models with way too little or too much dependence, to the…</h3></div>
            <div><p>medium.com</p></div>
          </div>
          <div>
            <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*sSq_Z0NzCeTO7eGYXoOr8g.png)"></div>
          </div>
        </div>
      </a>
    </div><h2 id="d69c">Main page</h2><div id="dddc" class="link-block">
      <a href="https://readmedium.com/a-complete-introduction-to-time-series-analysis-with-r-9882f2d44c9d">
        <div>
          <div>
            <h2>A Complete Introduction To Time Series Analysis (with R)</h2>
            <div><h3>During these times of the Covid19 pandemic, you have perhaps heard about the collaborative efforts to predict new…</h3></div>
            <div><p>medium.com</p></div>
          </div>
          <div>
            <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*TL2PeOANEN4zG0_OqoHptQ.jpeg)"></div>
          </div>
        </div>
      </a>
    </div><h2 id="653d">Follow me at</h2><ol><li><a href="https://www.linkedin.com/in/hair-parra-526ba19b/">https://www.linkedin.com/in/hair-parra-526ba19b/</a></li><li><a href="https://github.com/JairParra">https://github.com/JairParra</a></li><li><a href="https://medium.com/@hair.parra">https://medium.com/@hair.parra</a></li></ol></article></body>

A Complete Introduction To Time Series Analysis (with R):: Estimation of ARMA(p,q) Coefficients (Part I)

Burg’s Algorithm Estimation Formulas

In the last article, we discussed the extension of the Innovations algorithm for the more general ARMA(p,q) process, which allowed us to make predictionsf for arbitrary number of timesteps in the future. However, we still haven’t seen how to estimate the actual ARMA(p,q) model coefficients. In this article, we will see two algorithms for estimating AR(p) coefficients, and in the next article, we will see how to estimate MA(q) and start taking a look into jointly estimating ARMA(p,q) coefficients. Let’s jump right into it!

Estimation of AR(p) :: Yale-Walker

In real world problems, the ACVF is the easiest thing to estimate using the sample data. In other words, recall that we can estimate it as

From which we can in turn estimate autocorrelation as

The Yale-Walker equations rely on the method of moments, to construct statistics that help us estimate such coefficients.

Consider the AR(p) process with coefficients

Then, it’s estimated values (hat phi) satisfy the equations

where

Here, recall that the big Gamma matrix is symmetric with all correlations between 0 and (p-1), and the other vector is the gamma vector of all correlations from 1 to p. Further, this allows estimation of the variance as

, where

Moreover, for large n and given a “true” AR(p) model (that is, given that p is correctly specified)

That is, the parameters follow a multivariate normal distribution as specified above.

Proof

Recall that for a mean-zero process,

. If our process is AR(p), then

That is, we have that

From this, it follows that

This implies that

Differently put, we have a system of p linear equations with p variables; we can use these to solve for the phi coefficients. Written down:

Note that the first equation is independent of the others! Therfore, replacing the sample ACVF versions and writing down in vector notation, we see that this becomes

And therefore we can estimate the coefficients by solving for the phi vector, as stated.

Remark

Suppose that the true model is an AR(p) model, but instead we fit an AR(m) model, where m<p. If this happens, there will be missing coefficients from the true model, and so the estimated coefficients will no longer be (statistically) consistent! On the other hand, if m > p , then we can write

, so that the extra parameters we specified are set to zero, and the model still recovers the true AR(p).

Burg’s Algorithm

Recall that the PACF is given by

wher the Gamma matrix and vector are given by

Recall that the PACF tells us how much of the variablity in our value at time t is explained by a value at lag p, if we are fitting the BLP with t components. If we have a causal AR(p) model, then

The idea is as follows: we can also use the PACF to estimate the AR(p) coefficients. Burg’s algorithm makes precisely use of this.

Consider an AR(p) process.

given the previous i observations.

, where

Note that for each i , we are looking at the innovation that we get from taking the BLP using the previous i observations for looking at a particular time in our sequence; i.e. it works us backwards from the last time point all the way back to the earlierst point we can go by i observations.

where

Note that both u and v are functions of the data and phi parameters!

In addition, we use

You can think this as estimating the coefficients optimizing (minimizing) a mix between averaged adjacent past and future observations for every time step, and considering these in turn.

6. Although the proof is definitely out of scope, with this we have actually estimated a set of PACF coefficients

Which are valid up to step p. At this point, we essentially go back to the Durbin-Levinson algorithm , that is, for n steps, we

7. This yields in turn a set of coefficients which can be used as estimations of

That is, the last value of the phi values at each timestep, which constitute the set of estimated AR(p) coefficients.

8. For large n and given a “true” AR(p) model, it holds that

This asymptotic normality ensures “nice” properties when it comes to confidence intervals

And that’s pretty much it! Although it might be somehow confusing due to the notation and the various formulas and equations, the basic idea is that Burg’s algorithm produces a set of somehow more statistically sound estimates of the AR(p) model (assuming this one has been correctly specified), by considering not only projections and BLP that use observations from the past, but also even in the “future”, given the data at hand.

How to R

As usual, we first start by importing a couple of useful libraries.

Estimating AR(p) coefficients

We will first start by creating some artificial AR(4) process along with some data generated from it, and then proceed to use the algorithms we have seen to estimate its coefficients. First, let’s generate some data:

Now let’s check the roots of this process to make sure it is stationary,

So we see that we have all four different (inverse of the) roots that are all inside the circle, so the process is causal and stationary. (Although see that one of them is close to the edge!!! That is, although the process is stationary, the hardest the estimation will be, the bigger the variance, the worse the performance). Now generate some data and compare sample ACF and PACF to truth

This definitely doesn’t look like white noise! Let’s check its ACF and PACF functions:

Notice how the number of lags that fall out of bounds is precisely 4; the specified degree of the AR(4) polynomial. In the code above, we have also plotted the actual values (as points). Also, notice how close the estimated ACF and PACF values are actually are to the true ones.

Let’s now try to fit a couple of algorithms with the help of the ar function:

In the first part of the code block, we fit three models: Yule-Walker, Burg’s Method, and the Gaussian Time Series MLE. Then, we construct a function to pack to repeat this process and also pack all the relevant information we need into one big tibble object. We can then check the tables with information about the true and fitted coefficients and standard errors as follows:

Notice that even though we over-specified the model, the extra-coefficients are actually very close to 0! In general, it’s better to “over-specify” the model, but we will see later a simple, but solid method for model selection.

Let’s now go and visualize the actual coefficients and confidence intervals that we obtain from the tables for the different methods:

Now, we would like to obtain some diagnostics with the model; we can also use the Arima function to fit an AR(4) model using MLE, and obtain diagnostics.

That is, we explicitly specify the order of the model, and then we can see the corresponding estimated coefficients, the sigma², log-likelihood, and other evaluation metrics. Checking the residuals yields

From which we can see that the model indeed looks like white noise, and also approximately normally distributed. The Ljung-Box test does NOT reject the hypothesis that the residuals are white noise.

Next time

In the next article, we will continue with algorithms to estimate the MA(q) coefficients independently, then how to combine these along with the other AR(p) algorithms to estimate the ARMA(p,q) coefficients as a whole, and finally, we study Gaussian Time Series in subsequent articles, one of the most widely used assumptions to estimate ARMA(p,q) parameters. Until next time!

Last time

Forecasting with ARMA(p,q) models

Main page

Follow me at

  1. https://www.linkedin.com/in/hair-parra-526ba19b/
  2. https://github.com/JairParra
  3. https://medium.com/@hair.parra
Machine Learning
Statistics
Mathematics
Forecasting
Time Series Analysis
Recommended from ReadMedium