Free AI web copilot to create summaries, insights and extended knowledge, download it at here

5953

Abstract

it:800/1*[email protected]"><figcaption></figcaption></figure><p id="4ac3">i.e.</p><figure id="eb28"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*[email protected]"><figcaption></figcaption></figure><p id="e9f6">where ρ is the Pearson correlation coefficient between the lagged series and the differenced series, and</p><figure id="3ae2"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*[email protected]"><figcaption></figcaption></figure><p id="a6d7">Note that even if we had use Bessel’s correction for the variances, the results for α and β would remain unchanged.</p><p id="660a">Now we need to get the variance of β. From OLS we have that the covariance matrix of δ is:</p><figure id="e84d"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*[email protected]"><figcaption></figcaption></figure><p id="9792">Then the variance of β is:</p><figure id="35c0"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*[email protected]"><figcaption></figcaption></figure><p id="fc86">where</p><figure id="338c"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*[email protected]"><figcaption></figcaption></figure><p id="7feb">is the variance of the regression residuals. Note that we have used <i>T</i>-2 instead of <i>T</i> in the denominator because there are only only <i>T</i>-2 degrees of freedom for the residuals in OLS, since the two constraints hold:</p><figure id="dec7"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*[email protected]"><figcaption></figcaption></figure><p id="c236">i.e. by construction the mean of the residuals is zero and the covariance between the regressors and the residuals is zero.</p><p id="25e8">Then, expanding the equation for the variance of the residuals and noting that the estimation of the time series differences is:</p><figure id="98ff"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*[email protected]"><figcaption></figcaption></figure><p id="39e2">we get that:</p><figure id="168c"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*[email protected]"><figcaption></figcaption></figure><p id="3d34">This result would have been a lot messier if we had used Bessel’s correction for the variances.</p><p id="65c0">Then we can express the variance for β as</p><figure id="1086"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*[email protected]"><figcaption></figcaption></figure><p id="855e">Finally, after all our hard work we can write the closed form for the Dickey-Fuller test statistic:</p><figure id="fb10"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*[email protected]"><figcaption></figcaption></figure><h2 id="4256">Closed-form expression result</h2><p id="fb3f">The result for the closed-form expression of the Dickey-Fuller test statistic is:</p><figure id="f799"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*[email protected]"><figcaption></figcaption></figure><p id="7300">where <i>T</i>+1 is the sample size of our data and ρ is the correlation coefficient between the lagged time series (sample size <i>T</i>) and the differenced time series (sample size <i>T</i>).</p><p id="465d">The only thing we need to compute is a correlation coefficient, which is more efficient than computing OLS. This becomes handy in optimization routines and in real-time analysis of time series, where each millisecond counts.</p><h2 id="7314">Sanity check</h2><p id="13aa">In this section, we will compare our results with the results obtained from the Statsmodels (Python) library. Let us define our functions for the Dickey-Fuller test statistic:</p> <figure id="4ce2"> <div> <div>

            <iframe class="gist-iframe" src="/gist/dash-db/7e8756e371c62fef5c5f9b78ec9294fd.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><p id="c944">We will use an AR(1) unit root process with standard normal increments (Brownian motion) to conduct our tests:</p>
    <figure id="6f81">
        <div>
          <div>
            
            <iframe class="gist-iframe" src="/gist/dash-db/e72d370587e423349d88d30d4ed31103.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><p id="0615">Now, we get an estimate of the relative mean error of our direct Dickey-Fuller estimation and the OLS approach (Statsmodels), i.e. |DF_statmodels — DF_direct| / | DF_direct |. This is precisely what the next function accomplishes. It runs “n_tests” with random Brownian motions and returns an array with the differences.</p>
    <figure id="7d53">
        <div>
          <div>
            
            <iframe class="gist-iframe" src="/gist/dash-db/1e5f3b030c9d52e808b55aeb987a3039.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><p id="1951">Running the tests and plotting:</p>
    <figure id="87d8">
        <div>
          <div>
            
            <iframe class="gist-iframe" src="/gist/dash-db/e8d593a425590de514fb2276fa81c4ad.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><figure id="abc8"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*NvWs_6SXlM1ho_Cd_-xppA.png"><figcaption></figcaption></figure><p id="96c7">Note that your results will be different as this test is randomized.

Options

But for a time series of sample size 10,000 the relative mean error is around 1%. So our sanity check is indeed a success. There is, however, a slight difference between the two estimation approaches in some trials, this is due numerical instability caused by the unit root process in the estimation approaches. Nevertheless, it is something we can live with considering the increased computational efficiency.</p><h2 id="9ae7">Speed Tests</h2><p id="3560">Now the best part. In this section, we will compare the speed between the OLS approach (Statsmodels) and the direct estimation of the Dickey-Fuller test. The following code performs the time test for any of the two functions (approaches).</p> <figure id="28ae"> <div> <div>

            <iframe class="gist-iframe" src="/gist/dash-db/931d0c804e501c21bcecccacd7f4e490.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><p id="7dcf">Running the tests for a sample size range of 100 to 100,000 and plotting:</p>
    <figure id="fcc1">
        <div>
          <div>
            
            <iframe class="gist-iframe" src="/gist/dash-db/6461eb0583ef2ba1d146c05cf525d0e6.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><figure id="455d"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*CjNY1oQkJzxohaqAqC0S7w.png"><figcaption></figcaption></figure><p id="0b26">We can see that for large sample size time series the boost in speed is around 50x, but even for smaller sample sizes the is about 10x. So indeed, doing the math paid off.</p><h2 id="72b1">p-values</h2><p id="4e69">In this section, we will code a class to get p-values for the Dickey-Fuller test direct estimation. No statistical tool is complete without its p-values. We will do a Monte Carlo simulation using the AR(1) unit root process as described (and coded) above.</p>
    <figure id="1c32">
        <div>
          <div>
            
            <iframe class="gist-iframe" src="/gist/dash-db/7f566ba95d25cde2248b74c192917b66.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><p id="3939">Note that in this class we use the “get_DF” function and the “get_unit_root_proc” from the previous sections.</p><p id="b1e2">As an example, we use the DFProbTable object to get P values for T=500 :</p>
    <figure id="3681">
        <div>
          <div>
            
            <iframe class="gist-iframe" src="/gist/dash-db/c4494451ce1de1f577bc5d6c2dedde18.js" allowfullscreen="" frameborder="0" height="undefined" width="undefined">
          </div>
        </div>
    </figure></iframe></div></div></figure><h2 id="3807">Final Words</h2><p id="821c">The math in this story was a bit long, to say the least, but in the end, we got a result that was worth it. The formulation of the Dickey-Fuller statistic presented here is not only useful for optimizing computation efficiency but also to understand the statistic in another way.</p><p id="cac0">There is also a lesson to be learned: sometimes as data scientists, it is very easy to use libraries and model everything without much understanding of the underlying mathematics. Nevertheless, it is a good idea to go deeper into the math, not just as a learning exercise but also as a way to get new and different insights. A little knowledge is a dangerous thing.</p><h2 id="3333">References</h2><p id="2f17">[1] M. L. de Prado, D. Leinweber, <a href="http://...">Advances in cointegration and subset correlation hedging methods</a> (2012), Journal of Investment Strategies, Vol. 1, №2, pp. 67–115</p><p id="4f12">[2]<a href="https://web.stanford.edu/~mrosenfe/soc_meth_proj3/matrix_OLS_NYU_notes.pdf">https://web.stanford.edu/~mrosenfe/soc_meth_proj3/matrix_OLS_NYU_notes.pdf</a></p><p id="b930">[3] <a href="http://web.vu.lt/mif/a.buteikis/wp-content/uploads/PE_Book/3-2-OLS.html">http://web.vu.lt/mif/a.buteikis/wp-content/uploads/PE_Book/3-2-OLS.html</a></p><p id="d712">I hope this story was useful to you. If I missed anything, please let me know. Follow me on <a href="https://medium.com/@diego-barba">Medium</a> if you would like more stories like this.</p><div id="fb86" class="link-block">
      <a href="https://medium.com/subscribe/@diego-barba">
        <div>
          <div>
            <h2>Get an email whenever Diego Barba publishes.</h2>
            <div><h3>Get an email whenever Diego Barba publishes. By signing up, you will create a Medium account if you don't already have…</h3></div>
            <div><p>medium.com</p></div>
          </div>
          <div>
            <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*4nfEm1Yml1TldHw7)"></div>
          </div>
        </div>
      </a>
    </div><p id="87bb">Liked the story? Become a Medium member through my referral link and get unlimited access to my stories and many others.</p><div id="7760" class="link-block">
      <a href="https://medium.com/@diego-barba/membership">
        <div>
          <div>
            <h2>Join Medium with my referral link - Diego Barba</h2>
            <div><h3>As a Medium member, a portion of your membership fee goes to writers you read, and you get full access to every story…</h3></div>
            <div><p>medium.com</p></div>
          </div>
          <div>
            <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*0Cz4NLKPxjR4zMY3)"></div>
          </div>
        </div>
      </a>
    </div></article></body>

Dickey Fuller Direct Estimation — Speed up to 50x Test Statistic Computation

Avoid unnecessary regressions and matrix inversions by directly estimating the Dickey-Fuller test statistic through the correlation coefficient.

The Dickey-Fuller test is perhaps the most well-known among stationarity (unit root) tests in time series analysis. The computation procedure for the test relies on linear regression results for the concrete formulation of the statistic. However, linear regression requires matrix inversions, which can be computationally intensive and even numerically unstable.

In this story, we will explore the math behind OLS (ordinary least squares) and use such analysis to derive a closed-form expression for the Dickey-Fuller test statistic (using 1 time lag and a constant). The resulting expression only uses a correlation coefficient, there are no matrix inversions or computationally intensive operations. This can speed computations up to 50x.

Closed-form expression derivation
Closed-form expression result
Sanity Check
Speed Test
p-values
Final Words

Closed-form expression derivation

If you want to skip the mathematical details, scroll to the next section, no harm done.

For those of you still here, let us first enunciate the problem formally. The Dickey-Fuller test (non-augmented) autoregressive model specification up to one lag and a constant is:

with ε i.i.d, this equation can be cast into a form where the time series increment is explicit:

α, β and their variances are estimated through OLS.

The Dickey-Fuller test statistic is defined as:

We could perform OLS regression numerically, get β and its variance and call it a day. This would involve a matrix inversion and matrix multiplications which are computationally taxing. So we are not going to do that.

The other road we could take is doing the math. It seems that nowadays I spend most of my time crunching numbers on the computer and almost no time on the blackboard doing the actual math. In this case, doing the math does pay off.

First, we will formulate our regression in terms of matrices and vectors as

where S_d is a vector of dimension T made up from the differences of S,

and X is a Tx2 matrix

Let S_L be a T dimensional vector with the lagged time series S, then the following relationships hold:

the mean of S_L:

the variance of S_L:

the mean of S_d:

the variance of S_d:

the covariance of S_L and S_d:

Note that we do not use Bessel’s correction because the resulting equations would get a larger number of terms. When in doubt go for the result that yields the most beautiful mathematical expression. You could try it yourself, follow the next steps using Bessel’s correction for the variances.

The OLS estimator in matrix form is:

where T superscript denotes matrix transpose. We have then

its inverse:

and

Hence,

i.e.

where ρ is the Pearson correlation coefficient between the lagged series and the differenced series, and

Note that even if we had use Bessel’s correction for the variances, the results for α and β would remain unchanged.

Now we need to get the variance of β. From OLS we have that the covariance matrix of δ is:

Then the variance of β is:

where

is the variance of the regression residuals. Note that we have used T-2 instead of T in the denominator because there are only only T-2 degrees of freedom for the residuals in OLS, since the two constraints hold:

i.e. by construction the mean of the residuals is zero and the covariance between the regressors and the residuals is zero.

Then, expanding the equation for the variance of the residuals and noting that the estimation of the time series differences is:

we get that:

This result would have been a lot messier if we had used Bessel’s correction for the variances.

Then we can express the variance for β as

Finally, after all our hard work we can write the closed form for the Dickey-Fuller test statistic:

Closed-form expression result

The result for the closed-form expression of the Dickey-Fuller test statistic is:

where T+1 is the sample size of our data and ρ is the correlation coefficient between the lagged time series (sample size T) and the differenced time series (sample size T).

The only thing we need to compute is a correlation coefficient, which is more efficient than computing OLS. This becomes handy in optimization routines and in real-time analysis of time series, where each millisecond counts.

Sanity check

In this section, we will compare our results with the results obtained from the Statsmodels (Python) library. Let us define our functions for the Dickey-Fuller test statistic:

We will use an AR(1) unit root process with standard normal increments (Brownian motion) to conduct our tests:

Now, we get an estimate of the relative mean error of our direct Dickey-Fuller estimation and the OLS approach (Statsmodels), i.e. |DF_statmodels — DF_direct| / | DF_direct |. This is precisely what the next function accomplishes. It runs “n_tests” with random Brownian motions and returns an array with the differences.

Running the tests and plotting:

Note that your results will be different as this test is randomized. But for a time series of sample size 10,000 the relative mean error is around 1%. So our sanity check is indeed a success. There is, however, a slight difference between the two estimation approaches in some trials, this is due numerical instability caused by the unit root process in the estimation approaches. Nevertheless, it is something we can live with considering the increased computational efficiency.

Speed Tests

Now the best part. In this section, we will compare the speed between the OLS approach (Statsmodels) and the direct estimation of the Dickey-Fuller test. The following code performs the time test for any of the two functions (approaches).

Running the tests for a sample size range of 100 to 100,000 and plotting:

We can see that for large sample size time series the boost in speed is around 50x, but even for smaller sample sizes the is about 10x. So indeed, doing the math paid off.

p-values

In this section, we will code a class to get p-values for the Dickey-Fuller test direct estimation. No statistical tool is complete without its p-values. We will do a Monte Carlo simulation using the AR(1) unit root process as described (and coded) above.

Note that in this class we use the “get_DF” function and the “get_unit_root_proc” from the previous sections.

As an example, we use the DFProbTable object to get P values for T=500 :

Final Words

The math in this story was a bit long, to say the least, but in the end, we got a result that was worth it. The formulation of the Dickey-Fuller statistic presented here is not only useful for optimizing computation efficiency but also to understand the statistic in another way.

There is also a lesson to be learned: sometimes as data scientists, it is very easy to use libraries and model everything without much understanding of the underlying mathematics. Nevertheless, it is a good idea to go deeper into the math, not just as a learning exercise but also as a way to get new and different insights. A little knowledge is a dangerous thing.

References

[1] M. L. de Prado, D. Leinweber, Advances in cointegration and subset correlation hedging methods (2012), Journal of Investment Strategies, Vol. 1, №2, pp. 67–115

[2]https://web.stanford.edu/~mrosenfe/soc_meth_proj3/matrix_OLS_NYU_notes.pdf

[3] http://web.vu.lt/mif/a.buteikis/wp-content/uploads/PE_Book/3-2-OLS.html

I hope this story was useful to you. If I missed anything, please let me know. Follow me on Medium if you would like more stories like this.

Get an email whenever Diego Barba publishes.

Get an email whenever Diego Barba publishes. By signing up, you will create a Medium account if you don't already have…

medium.com

Liked the story? Become a Medium member through my referral link and get unlimited access to my stories and many others.

Join Medium with my referral link - Diego Barba

As a Medium member, a portion of your membership fee goes to writers you read, and you get full access to every story…