avatarSaleha Shaikh

Summary

Google has introduced TimesFM, a decoder-only foundation model for time-series forecasting that leverages zero-shot learning capabilities.

Abstract

Google's TimesFM represents a significant advancement in time-series forecasting, offering a model that can predict future data points without task-specific training. This decoder-only model, pre-trained on a vast corpus of 100 billion real-world time-points, is designed to handle a variety of forecasting tasks across different domains. It utilizes transformer layers and a unique architecture that processes time-series data in patches, allowing for efficient and accurate predictions. TimesFM's performance has been benchmarked against traditional statistical methods and deep learning models, demonstrating superior results, particularly in zero-shot testing scenarios. The model's upcoming availability on Google Cloud Vertical AI is poised to revolutionize forecasting by minimizing development time and enhancing predictive accuracy for businesses.

Opinions

  • The author believes that TimesFM's zero-shot learning capability is a game-changer for time-series forecasting, reducing the need for extensive model tuning.
  • TimesFM's architecture, which treats contiguous time-points as tokens and uses a multilayer perceptron block with residual connections, is considered innovative and effective.
  • The use of both synthetic and real-world data for pretraining is seen as crucial for the model's ability to generalize across diverse domains.
  • The author expresses that TimesFM's performance, as evidenced by its evaluation against benchmarks like the Monash Forecasting Archive, is superior to existing statistical and deep learning models.
  • The collaboration across Google Research and Google Cloud in developing TimesFM is highlighted as a testament to its potential impact on various industries.
  • The author anticipates that making TimesFM available on Google Cloud Vertical AI will greatly benefit external customers by streamlining the forecasting process and increasing model accessibility.

Google Just Published a Decoder-Only Model for Time-Series Forecasting With Zero-Shot Learning!!

Time series forecasting drives most of the business today, and at the moment, it has been predicted using conventional ML/NN models, but Google just published a TimesFM model, which can make time series predictions with zero-shot learning!!

Photo by Isaac Smith on Unsplash

Time series prediction holds importance for businesses as it enables them to make decisions by utilizing past information and future projections. By analyzing data, companies can enhance their planning, optimize resource allocation, and mitigate issues.

In the finance sector, it helps in making investment decisions and avoiding challenges. Supply chain management benefits from predictions in determining inventory levels to minimize costs. In terms of marketing and sales, it assists in identifying the timing for promotions to attract customers and maximize profits. Ultimately time series prediction empowers businesses to make choices. Maintain competitiveness in an ever-evolving world.

Time Series Prediction

Time series prediction is very important for businesses as it enables them to make decisions by leveraging historical data and future projections. By analyzing trends, companies can enhance their planning, optimize resource allocation, and mitigate challenges.

In the finance sector, it empowers investors to make choices and steer clear of pitfalls. In supply chain management it helps in maintaining inventory levels to minimize costs. In marketing and sales, it assists in determining the timing for promotions to attract customers and boost revenue. Overall, time series prediction serves as a tool for businesses to enhance decision-making capabilities and remain competitive in an evolving world.

Photo by Trevor Vannoy on Unsplash

Conventional way of time series prediction

At present, for time series prediction, there are different approaches that are used in industry and academic research. The method chosen usually depends on the characteristics of the data and the specific prediction task at hand. In today's market, there are cutting-edge models that are widely used. Let’s revise some state-of-the-art models:

1. ARIMA (AutoRegressive Integrated Moving Average)

ARIMA is a classical statistical model. It combines autoregressive (AR) and moving average (MA). It yields good performance for time series data with a strong seasonality or trend.

2. LSTM (Long Short-Term Memory) Networks

LSTM is a type of recurrent neural network (RNN). LSTM is a deep learning model designed to capture long-term dependencies in sequential data.

3. Prophet

FBProphet was developed by Facebook. Prophet is a forecasting tool designed for time series data that exhibits patterns on different time scales. It handles missing data and outliers well.

4. XGBoost (Extreme Gradient Boosting)

XGBoost is an ensemble learning algorithm. It used the predictions from multiple weak models and combined the results to generate the final prediction. It is very good in finding complex relationships in the time series data.

It depends on which models to use based on the type of data. However, it is worth noticing that while using these models, a considerable amount of time is spent on finalizing these models. To solve this problem and to make use of recent advancements in LLMs (generative AI), Google has launched a new model, “TimesFM,” which is a decoder-only foundation model that can perform time series prediction with zero-shot learning!!!

Google TimesFM: A decoder-only foundation model for time-series forecasting

TimesFM is designed as a pre-trained forecasting model with just 200 million parameters, leveraging a large time-series corpus of 100 billion real-world time-points.

Unlike large language models (LLMs) that are typically trained in a decoder-only fashion for natural language processing, TimesFM adapts the decoder-only architecture to time-series forecasting. The model employs stacked transformer layers treating contiguous time-points as tokens. A unique feature is the use of a multilayer perceptron block with residual connections to convert time-series patches into tokens. During training, the model learns to forecast subsequent time-points, with the output patch length potentially exceeding the input patch length. This architecture allows for efficient forecasting with fewer generation steps during inference.

To enhance the model’s performance, a combination of synthetic and real-world data is used for pretraining. Synthetic data aids in grasping fundamental temporal patterns, while a curated corpus of 100 billion time-points, including data from Google Trends and Wikipedia Pageviews, provides real-world flavor, helping the model generalize across diverse domains.

Model architecture

Transformers has shown promising results in adapting to a different context length. Model breaks down the time-series into several patches during training.A patch of a time series is a natural analog for a token in language models and has been shown to improve performance. Moreover, this improves inference speed as the number of tokens being fed into the transformer is reduced by a factor of the patch length.

TimesFM is a decoder-only model, meaning that for a given sequence of input patches, the model predicts the next patch as a function of all past patches. This model has a preprocessing unit in the form of input layers. It converts time series into input tokens. First, break the input into contiguous non-overlapping patches. Then each patch is processed by a Residual Block into a vector of a model dimension size. A binary padding mask is supplied along with input layer. The Residual Block is essentially a Multi-layer Perceptron (MLP) block with one hidden layer with a skip connection.

This model has a stack of transformer layers which are placed on top of each other. Each of these layers has the standard multi-head self-attention (SA) followed by a feedforward network (FFN). The model uses causal attention. That is, each output token can only attend to input tokens that come before it in the sequence (including the corresponding input token). This can be described by the equation:

o_j = StackedTransformer((t_1, m˙ 1), · · · ,(tj , m˙ j )), ∀j ∈ [N]

TimesFM model architecture: Image taken from Google AI’s research blog. Source: https://blog.research.google/2024/02/a-decoder-only-foundation-model-for.html

Performance

To measure the prediction of model, MSE (mean squared error) metric is used. It quantifies the closeness between actual and predicted values.

TimesFM is evaluated through zero-shot testing on various time-series benchmarks, showcasing superior performance compared to statistical methods like ARIMA and DL models like DeepAR and PatchTST. The model’s effectiveness is demonstrated on the Monash Forecasting Archive and popular benchmarks for long-horizon forecasting, outperforming both supervised and zero-shot approaches.

Conclusion

The research presents a promising decoder-only foundation model for time-series forecasting, exhibiting impressive zero-shot performance across different domains and granularities.

The proposed TimesFM architecture, despite its smaller size, proves to be a powerful tool for accurate and efficient forecasting in diverse applications. The work represents a collaboration across Google Research and Google Cloud, involving several individuals contributing to the development and evaluation of TimesFM.

In my thoughts, this is a promising approach to minimize development time and increase model performance for time series predictions.

Related reads and important information

Later this year, Google is planning to make this model available for external customers in 𝗚𝗼𝗼𝗴𝗹𝗲 𝗖𝗹𝗼𝘂𝗱 𝗩𝗲𝗿𝘁𝗲𝘅 𝗔𝗜.

  1. Research paper: A decoder-only foundation model for time-series forecasting
  2. Google AI’s blog about this paper
Time Series Analysis
Large Language Models
Llm
Time Series Forecasting
Transformers
Recommended from ReadMedium