Read Medium logo
No Results
Translate to
Read Medium Logo
Free OpenAI o1 chatTry OpenAI o1 API
Read Medium logo
No Results
Translate to
avatarValeriy Manokhin, PhD, MBA, CQF

Summary

Deep learning methods are not recommended for time series forecasting unless one has extensive resources and expertise, as simpler statistical and machine learning methods often yield better results with lower costs.

Abstract

The article emphasizes that deep learning, despite its popularity, is not the most effective approach for time series forecasting for most companies. It cites a November 2022 study by Nixtla showing that deep learning methods not only underperformed compared to simpler econometric and statistical methods but also incurred significantly higher costs. The article advises against the common misconception that deep learning is superior, pointing out that claims of its performance are often biased or based on misrepresented data. It suggests that companies should focus on proven statistical, econometrics, and machine learning tools to build effective forecasting systems that deliver business value. The article also highlights a research paper from MIT where a simple benchmark model outperformed a sophisticated transformer-based architecture, reinforcing the argument that deep learning is not necessarily the best choice for forecasting tasks.

Opinions

  • Deep learning methods are often promoted by parties with vested interests, such as tech firms selling GPU hours or consultants.
  • The industry has examples of failed deep learning forecasting projects, suggesting that the method is not as reliable as traditional approaches.
  • Companies like Walmart, Target, and others have found more success with non-deep learning forecasting solutions.
  • Transformers, while effective in natural language processing (NLP), are considered fundamentally unsuitable for time series due to error accumulation and the inherent differences between language and time series data.
  • The article criticizes the practice of using only deep learning for time series forecasting, suggesting it is not a wise decision.
  • Even experts from Amazon's forecasting R&D team have acknowledged that deep learning may not be the best tool for the average forecasting use case.
  • The article implies that readers should be skeptical of claims made by deep learning proponents and instead trust empirical evidence and independent studies.
  • It is recommended to follow the author on Medium, Twitter, and LinkedIn for more insights into time series and forecasting.

๐ƒ๐ž๐ž๐ฉ ๐‹๐ž๐š๐ซ๐ง๐ข๐ง๐  ๐ˆ๐ฌ ๐–๐ก๐š๐ญ ๐˜๐จ๐ฎ ๐ƒ๐จ ๐๐จ๐ญ ๐๐ž๐ž๐

Unless you have tons of clean data and tens of top PhDs working on forecasting for over a decade, as Amazon and Alibaba do, and even then, would you just take claims from the same companies that sell GPU usage for granted?

November 2022 update: in a large-scale independent study by Nixtla it was confirmed that Deep Learning methods failed to outperform a simple ensemble of econometrics and statistical methods whilst resulting in x25,000 higher cost (0.05 cents for econometrics/stats ensemble vs $11,000 of GPU costs for Deep Learning).

When it comes to time series and forecasting, โ€œ๐ƒ๐ž๐ž๐ฉ ๐‹๐ž๐š๐ซ๐ง๐ข๐ง๐  ๐ˆ๐ฌ ๐–๐ก๐š๐ญ ๐˜๐จ๐ฎ ๐ƒ๐จ ๐๐จ๐ญ ๐๐ž๐ž๐.โ€

Do not let the deluge of papers about deep learning for time series (including ones produced by top tech companies and at top conferences) overwhelm you.

There is no evidence that deep learning methods, including transformers, outperform statistical and machine learning methods.

Most of the claims about the performance of deep learning come either from conflicted parties (including tech firms like Alibaba interested in selling more GPU hours) or academic labs that either by design or omission misrepresent the results of deep learning performance compared to other methods.

Such misrepresentation involves recycling the same toy datasets, dataset arbitrage (only showing results on datasets where deep learning works better), omitting non-deep learning benchmarks, not using correct benchmarks and many other tricks.

When starting the forecasting journey, donโ€™t get distracted from fundamentals.

And whoever tells you that your company needs deep learning for forecasting is often either unfamiliar with the subject of time series or has a vested interest in selling an ineffective piece of consulting advice or a forecasting technology that does not work.

The industry is littered with examples of implementations of deep learning projects that result in ineffective solutions or complete disasters at worst, where deep learning systems exploded into production.

And unless coming from Amazon and Alibaba, one never hears about successfully implemented deep learning solutions, whilst many of the top companies like Walmart, Target, and others have either unsuccessfully tried and then abandoned deep learning or otherwise built effective forecasting solutions without deep learning.

As for your average Joe Bloggs Inc multibillion super-duper company listed on cool Nasdaq or not-so-cool NYSE stock exchange, deep learning ๐๐จ๐ž๐ฌ ๐ง๐จ๐ญ ๐ฐ๐จ๐ซ๐ค ๐ง๐จ ๐ฆ๐š๐ญ๐ญ๐ž๐ซ ๐ฐ๐ก๐š๐ญ ๐ฒ๐จ๐ฎ๐ซ ๐ž๐ฑ๐ฉ๐ž๐ง๐ฌ๐ข๐ฏ๐ž ๐œ๐จ๐ง๐ฌ๐ฎ๐ฅ๐ญ๐š๐ง๐ญ๐ฌ ๐จ๐ซ not-so-deep in forecasting ๐ค๐ง๐จ๐ฐ๐ฅ๐ž๐๐ ๐ž ๐๐ก๐ƒ-๐ข๐ง-๐ข๐ซ๐ซ๐ž๐ฅ๐ž๐ฏ๐š๐ง๐ญ ๐Ÿ๐ข๐ž๐ฅ๐ ๐ƒ๐š๐ญ๐š ๐’๐œ๐ข๐ž๐ง๐ญ๐ข๐ฌ๐ญ ๐ญ๐ž๐ฅ๐ฅ๐ฌ ๐ฒ๐จ๐ฎ.

Fire your expensive consultants and PhD-in-irrelevant-field Data Scientist (hello, Zillow) and save yourself a lot of time, trouble and millions in wasted project costs and foregone profits.

DO NOT use deep learning for forecasting unless you have tons of clean data, an expert team and a lot of time to play with these toys. Do not touch deep learning until you have built an effective forecasting system that delivers business value using statistical, econometrics and machine learning tools.

โ€ FreDo: Frequency Domain-based Long-Term Time Series Forecastingโ€, a research ๐Ÿง paper from MIT, pitted super fancy transformer architecture against a simple, almost mechanical benchmark.

๐•‹๐•ƒ;๐”ปโ„ Transformer loses grotesquely

Let me repeat. A simple, almost mechanistic benchmark forecasting model from the Massachusetts Institute of Technology totally decimated sophisticated transformer-based architecture. And not one of your average transformers, the best and brightest of transformer for time series, the one that is better than another transformer, and so on.

๐•๐•–๐•ค, ๐•ช๐• ๐•ฆ ๐•™๐•’๐•ง๐•– ๐•™๐•–๐•’๐•ฃ๐•• ๐•š๐•ฅ ๐•ฃ๐•š๐•˜๐•™๐•ฅ. ๐•Š๐•š๐•ž๐•ก๐•๐•– ๐•’๐•๐•ž๐• ๐•ค๐•ฅ ๐•ž๐•–๐•”๐•™๐•’๐•Ÿ๐•š๐•ค๐•ฅ๐•š๐•” ๐•“๐•–๐•Ÿ๐•”๐•™๐•ž๐•’๐•ฃ๐•œ ๐••๐•–๐•”๐•š๐•ž๐•’๐•ฅ๐•–๐•ค ๐•ฅ๐•™๐•– ๐•ž๐•–๐•’๐•Ÿ๐•–๐•ค๐•ฅ ๐•ฅ๐•š๐•ž๐•– ๐•ค๐•–๐•ฃ๐•š๐•–๐•ค ๐•ฅ๐•ฃ๐•’๐•Ÿ๐•ค๐•—๐• ๐•ฃ๐•ž๐•–๐•ฃ. Like a bunch of kids in Michael Bayโ€™s movie won vs the meanest transformer ๐Ÿฟ

Simple mechanistic benchmark vs transformer

Like transformers? Stick to NLP. Transformers are excellent for ๐Ÿ˜Ž NLP domain, just not as awesome for #timeseries. Not at all. Please donโ€™t use them. Multiple research papers say that transformers are fundamentally unsuitable for time series for very good e never sees them winning forecasting and Kaggle competitions for good reasons.

It is not all about transformers. Using only deep learning for time series is generally not the brightest idea ๐Ÿ’ก. Even the most brilliant forecasting experts at Amazon Forecasting R&D fell into the trap of thinking that an average forecasting use case is amenable to deep learning. After the Kaggle Walmart M5 forecasting competition, they wrote a paper, โ€œLearning with treesโ€, saying they finally realised deep learning is not the go-to tool for forecasting.

The reason transformers donโ€™t work well for time series is straightforward โ€” errors accumulate in transformer-based architectures, and there is nothing the big transformer can do about it. There are other good reasons, including that NLP differs from time series, so just because something works in NLP does not mean it would work in time series. Time series are not the same as word sequences; in sentences, only context matters; in time series order of the values is what matters.

Some consolation for DeepAR and others is that they are now in the good company of deep learning models for time series that do not work so well.

ARIMA boosted trees, โ€ฆ, secret sauce less known approach [hey, itโ€™s a free Medium article did you expect freebie expert advice ๐Ÿ”‘ ] > deep learning.

Want to learn more about time series and forecasting? Follow me on Medium, Twitter and LinkedIn.

References:

  1. Statistical vs Deep Learning forecasting methods by Nixtla
  2. Python vs R for forecasting
  3. Forecasting with Trees by Amazon time series R&D lab
  4. Kaggle Walmart M5 forecasting competion
Timeseries
Forecasting
Deep Learning
Machine Learning
Business
Recommended from ReadMedium
avatarMarco Peixeiro
Hands-On with Moirai: A Foundation Forecasting Model by Salesforce

Discover the architecture and inner workings of Moirai and apply it in a forecasting project using Python

14 min read
avatarChristopher Tao
Do Not Use LLM or Generative AI For These Use Cases

Choose correct AI techniques for the right use case families

7 min read
avatarPelin Okutan
Time Series Forecasting: A Comparative Analysis of SARIMAX, RNN, LSTM, Prophet, and Transformerโ€ฆ

Time series forecasting plays a crucial role in various domains, from predicting stock prices and sales to weather conditions and energyโ€ฆ

5 min read
avatarShaw Talebi
I Spent $2,995 on Nassim Talebโ€™s Risk Taking Courseโ€Šโ€”โ€ŠHereโ€™s what I learned

Lessons and reflections from RWRI #19

8 min read
avatarChris Kuo/Dr. Dataman
Temporal Fusion Transformer for Interpretable Time Series Predictions

Sample eBook chapters (free): https://github.com/dataman-git/modern-time-series/blob/main/20240522beauty_TOC.pdf

25 min read
avatarRitesh Shergill
๐Ÿ••๐Ÿ•ก๐Ÿ•–Time Series Forecasting in the Year 2024

Can you tell the future?

9 min read