N-BEATS — The First Interpretable Deep Learning Model That Worked for Time Series Forecasting
An easy-to-understand deep dive into how N-BEATS works and how you can use it.
![](https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*wIZD2NaC2IUu1f2hd6XHGQ.png)
Time series forecasting has been the only area in which Deep Learning and Transformers did not outperform other models.
Looking at the Makridakis M-competition, the winning solutions always relied on statistical models. Until the M4 competition, winning solutions were either pure statistical or a hybrid of ML and statistical models. Pure ML approaches barely surpassed the competition baseline.
This changed with a paper published by Oreshkin, et al. in 2020. The authors published N-BEATS, a promising pure Deep Learning approach. The model beat the winning solution of the M4 competition. It was the first pure Deep Learning approach that outperformed well-established statistical approaches.
N-BEATS stands for Neural Basis Expansion Analysis for Interpretable Time Series.
In this article, I will go through the architecture behind N-BEATS. But do not be afraid, the deep dive will be easy-to-understand. I also show you how we can make the deep learning approach interpretable. However, it is not enough to only understand how N-BEATS works. Thus, I will show you how we can easily implement a N-BEATS model in Python and also tune its hyperparameters.
Let’s understand the core idea of N-BEATS before looking at its architecture.
The core functionality of N-BEATS lies in neural basis expansion.
Basis expansion is a method to augment data. We expand our feature set to be able to model non-linear relationships. Sounds abstract, right?
How does basis expansion work? For example, we have a problem in which our target value y stands in some relationship to a feature x. We want to represent the relationship between y and x using a linear model. In the 1d space, this will result in a linear relationship (left plot in the figure below).
However, the feature and target might not show a linear relationship, resulting in a useless model. Is there anything we can do?
Yes, we can expand our feature set. Let’s add the quadratic value of the original feature to the feature set, resulting in [x, x²]. With this, we moved from a 1d space into a 2d space since we now have two features instead of one. We can now fit a linear model in the 2d space, resulting in a second-degree polynomial model (right plot in the figure below).
![](https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*OCw6iIr2QnqXD67fslk5EA.png)
![](https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*lhoA8J5yFo4HtCioR93qFg.png)
And this is all there is to basis expansion. We extend our feature set by adding new features based on the original features. In this case, we used a polynomial expansion of degree 2. We added the quadratic value of each original feature to the feature set.
The most common basis expansion method is polynomial basis expansion. Yet, there are many other approaches, such as binning, piecewise-linear splines, natural cubic splines, logarithms, or squares.
The N-BEATS model decides which basis expansion to use. During training, the model tries to find the best basis expansion method to fit the data. We let the model do the work. That is why it is called ”neural basis expansion.”
How does N-BEATS work in detail?
The N-BEATS model has the following architecture:
![](https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*wIZD2NaC2IUu1f2hd6XHGQ.png)
A lot is going on here, and the picture might be overwhelming. But, the idea behind N-BEATS is straightforward.
We can observe two things.
First, the model splits the time series into a lookback and forecast period. The model uses the lookback period to make a forecast. The lookback period has multiple lengths of the forecast period. An optimal multiple usually lies between two and six.
Second, the N-BEATS model (yellow rectangle on the right) consists of layered stacks. Each stack, in turn, consists of layered blocks.
Each block has a fork-like structure (blue rectangle on the left). One branch in the block produces a backcast, and the other branch a forecast from some block input data. The forecast contains the prediction of unseen values. The backcast, in contrast, shows us the model’s fit on the input data.
How do we receive the backcast and forecast from the input? First, the model passes the input through a fully connected neural network with four layers. The MLP produces the expansion coefficients, theta, for the backcast and forecast. These expansion coefficients flow into two branches, one for backcasting and one for forecasting. In each branch, we perform the basis expansion. Here, the actual “neural basis expansion” happens.
As we can see in the picture above, N-BEATS connects various blocks in a stack. Because each block returns a backcast and forecast, two things happen. First, the model adds the partial forecast of each block to produce the stack’s forecast. Second, the model removes the backcast of a block from the block’s input. Hence, each block only receives the residual of the previous block. With this, the model only passes information that is not captured by the previous block to the next. Hence, each block tries to approximate only a part of the input signal, focusing on local patterns.
The N-BEATS model then layers various stacks. Like the blocks in a stack, each stack, except the first one, is trained on the residuals of the previous stack. With this, each stack learns a global pattern that was not captured before. The final forecast is the sum of the stack’s forecasts, providing a hierarchical decomposition.
As we can see, N-BEATS applies a double residual stacking approach. The backcast and forecast result in backward and forward residuals. The layered architecture of blocks and stacks leads to the stacking of these residuals. Through the double residual stacks, N-BEATS can recreate the mechanisms of statistical models.
The advantages of N-BEATS
Compared to other deep learning approaches, N-BEATS enables us to build very deep NNs with interpretable results. Moreover, the training is faster. The model does not contain any recurrent or self-attention layer. Its double residual stacking facilitates a more fluid gradient backpropagation.
Compared to classical time series forecasting approaches, we do not need to do any feature engineering. We do not need to identify time series-specific characteristics, like seasonality and trend. N-BEATS does this for us. This makes the model easy to use, and we can get quickly started.
Moreover, the model is capable of Zero-Shot Transfer Learning.
How can the deep learning architecture be interpretable?
Well, the generic version of N-BEATS, I described above, is not interpretable. There are no constraints on what basis functions the model can learn and the depth of the network. We do not know what the model learns and if these are time series-specific components, such as trend.
How do we gain interpretability?
There apply a trick. We restrict the depth of the model and which basis expansion functions the model can learn.
For example, we often use trend and seasonality in time series forecasting.
We can force the model to learn only these two characteristics. First, we restrict the depth of the model by only using two stacks. The first stack learns the trend, and the second stack learns the seasonality.
We can then interpret the model’s results by extracting each stack’s partial forecasts.
Second, we must force the model to learn the trend and seasonality only. We must introduce a problem-specific inductive bias. We achieve this by setting the basis expansion functions to specific functional forms. For this, we replace the last layer in each block with a function. We use a polynomial basis to determine the trend and a Fourier basis for seasonality.
![](https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*W7rN239dhvIB1IM0OUI1Hg.png)
Forecasting example using N-BEATS
Now that we know how N-BEATS works, let’s apply the model to a forecasting task.
We will predict the next two weeks of wholesale electricity prices in Germany. The data is provided by Ember in the “European Wholesale Electricity Price” under a CC-BY-4.0 license. We will use the N-BEATS implementation from Nixtla’s neuralforecast
library. The library makes it very easy for us to apply N-BEATS.
![](https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*x7Jyky7eI0E_51-vs7xJsw.png)
Please note that I am not trying to get a forecast as accurate as possible but rather show how we can apply N-BEATS.
Alright. Let’s get started.
Let’s import all the libraries we need and the dataset.