avatarJacob Ferus

Summary

This article introduces a powerful feature extraction method for time series data using autoregressive (AR) model coefficients, which can be used for classification or regression tasks.

Abstract

Time series feature engineering is a complex area due to the variable length of time series data. One powerful yet rarely mentioned method for extracting features is using autoregressive (AR) model coefficients. The AR model is a simple time series model that forecasts future values by using a linear sum of past lags. When this model is fitted to a time series, it learns a model of fixed parameters that can be used as a fixed-length vector of features. This method is fast to train and has low variance compared to neural networks. It is recommended to have a weakly stationary time series for better performance. If the time series is not stationary, differencing can be used to transform it into a stationary time series. The article demonstrates the use of this feature extraction method on a dataset and compares it to a convolutional neural network, achieving a higher accuracy.

Bullet points

  • Time series feature engineering is complex due to variable length of data.
  • Autoregressive (AR) model coefficients can be used for feature extraction.
  • AR model is a simple time series model that forecasts future values using a linear sum of past lags.
  • When fitted to a time series, AR model learns a model of fixed parameters that can be used as a fixed-length vector of features.
  • AR model is fast to train and has low variance compared to neural networks.
  • It is recommended to have a weakly stationary time series for better performance.
  • If the time series is not stationary, differencing can be used to transform it into a stationary time series.
  • The article demonstrates the use of this feature extraction method on a dataset and compares it to a convolutional neural network, achieving a higher accuracy.

MACHINE LEARNING IN PYTHON

The powerful feature extraction method you’ve never heard of

Extracting time series features using autoregressive models.

Generated by using OpenAI DALL·E 2.

Time series feature engineering is a very complex area because you are trying to take something that is of a long, possibly variable length, and transform it into a short fixed-length vector to be compared with other time series. This challenge has been tackled with various creative ideas and methods. One of these methods that is rarely mentioned, yet very powerful, is extracting autoregressive (AR) model coefficients.

The AR model is a classical time series model. It is relatively simple compared to modern larger neural networks like recurrent neural networks, convolutional neural networks or transformers, but is still today very competitive, especially for smaller datasets. It works by simply forecasting future values of a time series by using a linear sum of past lags. We have:

Okay, so it is a time series model, what has this to do with feature extraction? Here comes the ingenuity.. when you fit a model to a time series it learns a model of fixed parameters that will not only help forecast the future of the time series but will also characterize the time series. Thus, these parameters/coefficients can be used as a fixed-length vector of features! All you have to do is fit one AR model to each time series with the same lag and then extract the parameters.

But why the AR model? Why not fit a neural network or some other model? The AR model specifically works well for this because it is fast to train and has low variance. In comparison, neural networks are slow to train and have high variance, simply running the training procedure multiple times could lead to widely different results. The autoregressive-moving-average model (ARIMA) or vector auto-regression (VAR) model (in case of multivariate data) would be alternatives, but both of these are slower to train.

Going back to the assumptions of the AR model. One thing that is important to have is a weakly stationary time series. It is not a necessary condition for this feature extraction method, but it will likely perform much better with the assumption fulfilled. What does this mean? A time series is said to be weakly stationary if the mean, variance and covariance/correlation between time lags remain fixed for all t. In plain English it is a time series without large changes like cycles or trends, its behavior remains the same. Why do we want this? Because if the properties of the time series remain the same then the model can treat each observation the same way also.

When this assumption is not fulfilled, an operation called differencing can sometimes transform the time series to become stationary. This means that instead of using the raw data we use the changes from point t to t+1. That is:

A simple example showing how this works is to simulate some data. If we generate data points from a normal distribution, then use the cumulative sum as a time series we get the following:

Cumulative sum of normally distributed data (left) and the difference of the cumulative sum (right)

As can be seen on the left image the cumulative sum is clearly not stationary, while if we difference the cumulative sum (image on the right) we do get a stationary time series. Actually, the difference of the cumulative sum is the original time series, that is the random normally distributed data points. Since these were generated to be independent and identically distributed, the time series is of course stationary.

The differencing operation can be performed several times. To find the appropriate number of differences, the difference order, we can use the Augmented Dickey-Fuller unit root test (ADF test). This test has the null hypothesis that there exists a unit root (meaning the time series is non-stationary). Thus we would like to apply a differencing operation until we can reject the null hypothesis by some significance level. The following method does this, we test it on the normal data that was generated.

It gave the correct result. Note that only 2 differences are tested here. Usually, that should be enough. Otherwise, some other transformation could be used to make the time series stationary, for instance by removing the trend or seasonality.

Alright, with this preparation we are ready to test the feature extraction method, but first we need some data. I found this article where a multi-layer convolutional neural network was used to classify time series data. I will use the same data and compare my results to their model. Their result for the test set was an accuracy of ~96.8%. Let’s load the data:

First, we need to find the appropriate number of differences. The method shown before is used on the first 10 time series:

A unanimous choice of 0 differences was chosen. Thus, we can simply use the original data.

There is one other thing we need to decide: the number of lags. This can be done by finding the best number of lags for each time series, then taking the average to get one number to use for all time series. Here I’m using the function ar_select_order from statsmodels. The criterion used is the Akaike information criterion (AIC). You can read more about it here. Here I will only use every 10th time series of the training data to reduce the computational time.

The features are extracted and we are ready to fit a machine learning model. I will train a simple logistic regression model, tuned using cross-validation with the default parameters and then compare it with the other model:

With an accuracy of ~97.7% we beat the accuracy of ~96.8% of the convolutional neural network with just a logistic regression model! I said the method was powerful, didn’t I?

Conclusion

Feature extraction using AR-coefficients is a simple yet effective way of turning a time series into a fixed-length vector of features with discriminative information useful for classification or regression tasks. A must-have in your arsenal of feature extraction techniques.

If you’re interested in reading more articles about data science, check out my reading list below:

If you’d like to get a Medium membership you can use my referral link if you wish. Have a nice day.

Machine Learning
Artificial Intelligence
Data Science
Time Series Analysis
Python
Recommended from ReadMedium