Dickey Fuller Direct Estimation — Speed up to 50x Test Statistic Computation
Avoid unnecessary regressions and matrix inversions by directly estimating the Dickey-Fuller test statistic through the correlation coefficient.

The Dickey-Fuller test is perhaps the most well-known among stationarity (unit root) tests in time series analysis. The computation procedure for the test relies on linear regression results for the concrete formulation of the statistic. However, linear regression requires matrix inversions, which can be computationally intensive and even numerically unstable.
In this story, we will explore the math behind OLS (ordinary least squares) and use such analysis to derive a closed-form expression for the Dickey-Fuller test statistic (using 1 time lag and a constant). The resulting expression only uses a correlation coefficient, there are no matrix inversions or computationally intensive operations. This can speed computations up to 50x.
Table of contents
- Closed-form expression derivation
- Closed-form expression result
- Sanity Check
- Speed Test
- p-values
- Final Words
Closed-form expression derivation
If you want to skip the mathematical details, scroll to the next section, no harm done.
For those of you still here, let us first enunciate the problem formally. The Dickey-Fuller test (non-augmented) autoregressive model specification up to one lag and a constant is:

with ε i.i.d, this equation can be cast into a form where the time series increment is explicit:

α, β and their variances are estimated through OLS.
The Dickey-Fuller test statistic is defined as:

We could perform OLS regression numerically, get β and its variance and call it a day. This would involve a matrix inversion and matrix multiplications which are computationally taxing. So we are not going to do that.
The other road we could take is doing the math. It seems that nowadays I spend most of my time crunching numbers on the computer and almost no time on the blackboard doing the actual math. In this case, doing the math does pay off.
First, we will formulate our regression in terms of matrices and vectors as

where S_d is a vector of dimension T made up from the differences of S,

and X is a Tx2 matrix

Let S_L be a T dimensional vector with the lagged time series S, then the following relationships hold:
- the mean of S_L:

- the variance of S_L:

- the mean of S_d:

- the variance of S_d:

- the covariance of S_L and S_d:

Note that we do not use Bessel’s correction because the resulting equations would get a larger number of terms. When in doubt go for the result that yields the most beautiful mathematical expression. You could try it yourself, follow the next steps using Bessel’s correction for the variances.
The OLS estimator in matrix form is:

where T superscript denotes matrix transpose. We have then

its inverse:

and

Hence,

i.e.

where ρ is the Pearson correlation coefficient between the lagged series and the differenced series, and

Note that even if we had use Bessel’s correction for the variances, the results for α and β would remain unchanged.
Now we need to get the variance of β. From OLS we have that the covariance matrix of δ is:

Then the variance of β is:

where

is the variance of the regression residuals. Note that we have used T-2 instead of T in the denominator because there are only only T-2 degrees of freedom for the residuals in OLS, since the two constraints hold:

i.e. by construction the mean of the residuals is zero and the covariance between the regressors and the residuals is zero.
Then, expanding the equation for the variance of the residuals and noting that the estimation of the time series differences is:

we get that:

This result would have been a lot messier if we had used Bessel’s correction for the variances.
Then we can express the variance for β as

Finally, after all our hard work we can write the closed form for the Dickey-Fuller test statistic:

Closed-form expression result
The result for the closed-form expression of the Dickey-Fuller test statistic is:

where T+1 is the sample size of our data and ρ is the correlation coefficient between the lagged time series (sample size T) and the differenced time series (sample size T).
The only thing we need to compute is a correlation coefficient, which is more efficient than computing OLS. This becomes handy in optimization routines and in real-time analysis of time series, where each millisecond counts.
Sanity check
In this section, we will compare our results with the results obtained from the Statsmodels (Python) library. Let us define our functions for the Dickey-Fuller test statistic:


