Summary

Nixtla's mega study on time series forecasting reveals that machine learning methods, such as NHITs and TimeGPT, outperform traditional methods like ETS, TBATS, and Theta, with LGBM significantly outperforming all other models for hourly data.

Abstract

Nixtla's 2023 mega study on time series forecasting, based on a dataset containing 100 billion time series points, challenges longstanding academic forecasting beliefs. The study reveals that machine learning methods, such as NHITs and TimeGPT, outperform traditional methods like ETS, TBATS, and Theta. DeepAR, however, did not add significant value compared to the simple SeasonalNaive benchmark. For monthly data, NHITs closely followed TimeGPT, with negligible performance differences, while traditional models like Theta and its variant DOTheta exhibited strong results. For weekly data, NHITs nearly matched TimeGPT’s performance, with minimal differences, and DeepAR showed minimal enhancements compared to the SeasonalNaive benchmark. For daily data, NHITs nearly matched TimeGPT’s performance, and DeepAR did better but could not outperform a variation of the simple model Theta DOTheata. For hourly data, LGBM significantly outperformed TimeGPT and all other models, with NHITS being a strong solution again.

Opinions

Machine learning methods, such as NHITs and TimeGPT, outperform traditional methods like ETS, TBATS, and Theta in time series forecasting.
DeepAR did not add significant value compared to the simple SeasonalNaive benchmark.
Traditional models like Theta and its variant DOTheta exhibited strong results for monthly and daily data.
NHITs closely followed TimeGPT for monthly and weekly data, with negligible performance differences.
LGBM significantly outperformed TimeGPT and all other models for hourly data.
The absence of TBATS, ARIMA, and NBEATs in Nixtla’s study is not a reflection on their performance as these methods were not included in this study.
Nixtla’s study and results are proprietary and cannot be reproduced, so they should be taken as they are.

What Truly Works in Time Series Forecasting — The Results from Nixtla’s Mega Study

Time series is a captivating domain where the quest for a crystal ball never ceases.

Uncovering the best forecasting techniques has always been a pursuit in the field. While many regard understanding effective methods as the Holy Grail, it’s equally vital to identify those that fall short — take Facebook Prophet as a case in point.

For four decades, the mantra in applied forecasting has been ‘simpler methods prevail,’ influenced largely by the results of the M series forecasting competitions.

These competitions saw minimal machine learning participation, even from their organizers. Despite the M4 forecasting competition’s (2018) top two solutions being machine learning-based, the organizers remained stubbornly anti-machine learning, suggesting it’s ‘still up in the air’ whether machine learning surpasses traditional techniques like exponential smoothing in time series forecasting.

A few years after the conclusion of the M4 competition, the organizers shifted the subsequent M5 competition to Kaggle. For the first time, this move introduced the ‘M-competitions’ to machine learning. The outcome? A resounding upheaval of the longstanding academic forecasting beliefs, with undeniable proof — every top solution relied on machine learning — highlighting machine learning as the future of time-series forecasting.

Roll forward to 2023, with Nixtla publishing the results of the first mega study based on a dataset containing 100 billion time series points.

The results from Nixtla’s TimeGPT mega study

Nixtla’s study and results are proprietary and can not be reproduced, so we will take them as they are.

What new insights does the study bring? We will use Monash Time Series Forecasting Repository which has so far been the best publicly available reproducible study https://forecastingdata.org/

Monthly data:

Monash Time Series Forecasting Repository Insights: While the repository ranks ETS and TBATS as top methods for monthly data, Nixtla’s research added a fresh perspective by incorporating extensive machine and deep learning methods. Their findings revealed that NHITs closely followed TimeGPT, with negligible performance differences for monthly datasets.

Traditional models like Theta (M3 competition’s victor) and its variant DOTheta exhibited strong results. DeepAR did not add value by only slightly outpacing SeasonalNaive, a complimentary benchmark, while ETS’s edge over SeasonalNaive was minimal. Nixtla’s research did not feature strong benchmark TBATS despite its known efficacy with monthly data.

Weekly data:

Monash Time Series Forecasting Repository Highlights: NBEATS and TBATS emerge as top contenders. Intriguingly, Nixtla omitted TBATS from their research.

Nixtla’s Findings: Surprisingly, NBEATS is also absent from their study. NHITs nearly matched TimeGPT’s performance on weekly data sets, showcasing minimal differences. DeepAR, once more, showed minimal enhancements compared to the straightforward seasonal naive benchmark. It’s a letdown that other models lagged in weekly data performance, and the absence of TBATS (known for its efficacy in monthly and weekly forecasting) in Nixtla’s study is particularly noteworthy.

Daily data:

Monash Time Series Forecasting Repository Highlights: TBATS is the top contender. Intriguingly, Nixtla omitted TBATS from their research.

Nixtla’s Findings:

NHITs nearly matched TimeGPT’s performance on weekly data sets, showcasing minimal differences. DeepAR did better on daily data but could not outperform a variation of the simple model Theta DOTheata; once again Theta and DOTheta performed very well.

Hourly data:

Monash Time Series Forecasting Repository Highlights: There needs to be clear evidence as insufficient hourly data, albeit Pooled Regression is mentioned as the winning solution for one dataset.

As confirmed by Nixtla, the absence of TBATS, ARIMA and NBEATs are not a reflection on their performance as these methods were not included into this study.

Nixtla’s Findings: LGBM has significantly outperformed TimeGPT and all other models. NHITS was a strong solution again

References:

TimeGPT1 — https://arxiv.org/abs/2310.03589
Monash Time Series Forecasting Repository https://forecastingdata.org/