Chronos: Another Zero-Shot Time Series Forecaster LLM
Pretrained language models revolutionize the way we predict trends and patterns in data.
Introduction
Time series forecasting is a crucial task across a wide range of domains, from finance and economics to healthcare, energy, and climate science. Traditional approaches have relied on statistical models like ARIMA or exponential smoothing. In recent years, deep learning methods have gained popularity, achieving state-of-the-art performance by leveraging patterns across large datasets.
Challenges - Time Series Forecasting with LLM’S:
Most deep learning models require task-specific training on each dataset. There is a lack of large, high-quality time series datasets compared to domains like natural language processing (NLP). And developing generalist models that can forecast on new datasets in a zero-shot manner, without any dataset-specific training, remains an open challenge.
What New Chronos Offer?
Chronos — a novel time series forecasting framework proposed by researchers from Amazon Web Services. Chronos takes a radically simple approach, repurposing language models, which have achieved remarkable success in NLP, for time series forecasting. Let’s dive into how Chronos works and the promising results it achieves.
Tokenizing Time Series into a “Language”
The key insight behind Chronos is that time series forecasting can be formulated in a similar manner to a language model predicting the next word. Language models operate on a fixed vocabulary of discrete tokens, while time series have real-valued, often continuous, observations.
Chronos bridges this gap by quantizing the time series values into a fixed set of tokens through a simple process:
- Mean scaling: Scale each time series by its mean absolute value to normalize the data
- Quantization: Map the scaled values into a fixed number of bins, assigning each bin a discrete token
Mathematically, the mean scaling operation can be represented as:
x̃ᵢ = (xᵢ — m) / s
where xᵢ is the i-th value in the time series, m is the mean of the absolute values in the historical context, and s is the scaling factor, typically set to the mean absolute value itself.
The quantization process involves defining a set of bin centers c₁ < ⋯ < cᵦ and edges bᵢ such that cᵢ < bᵢ < cᵢ₊₁. The quantization function q and dequantization function d are then defined as:
q(x) = 1 if -∞ ≤ x < b₁ 2 if b₁ ≤ x < b₂ ⋮ B if bᵦ₋₁ ≤ x < ∞
d(j) = cⱼ
This tokenization process allows any time series to be represented as a sequence of tokens from a fixed “time series vocabulary”, in the same format that a language model expects.
Leveraging Pretrained Transformer Architectures
With the time series quantized into tokens, Chronos can directly utilize existing pretrained language models with minimal changes. The researchers focused on encoder-decoder models like T5, as well as decoder-only models like GPT-2.
The only architectural modification is changing the embedding layers to match the time series vocabulary size. Otherwise, the Transformer architecture is used as is. This is a notably minimalist approach — Chronos does not introduce any time series-specific architectural changes like temporal convolutions or time features. It simply treats the time series as a token sequence.
Following standard language model training, Chronos is trained to predict the next token given the preceding ones. It uses a classic cross-entropy loss between the predicted distribution over tokens and the ground truth next token.
The loss function for a single tokenized time series is given by:
ℓ(θ) = -Σʰ⁼¹ᴴ⁺¹ Σⁱ⁼¹ⱼⱽᵗₛʲ 𝟏(zC+h+1 = i) log pθ(zC+h+1 = i | z1:C+h)
where pθ(zC+h+1 = i | z1:C+h) denotes the categorical distribution predicted by the model parameterized by θ, and Vts is the time series vocabulary.
Importantly, Chronos is pretrained on a large corpus of time series from diverse domains. This pretraining allows it to learn general patterns that can transfer to new datasets.
Synthetic Time Series Generation
Despite leveraging multiple public time series datasets, the researchers found that pretraining data was still limited compared to typical language modeling datasets. To enhance data diversity, they propose two augmentation approaches:
- TSMix: This extends the mixup augmentation used in computer vision to randomly combine multiple time series at training time.
The TSMix augmentation generates new time series by taking a convex combination of k randomly sampled time series:
x̃TSMix(1:l) = Σⁱ⁼¹ᵏ λᵢ x̃ⁱ(1:l)
where λᵢ are the mixing weights sampled from a Dirichlet distribution.
2. KernelSynth: This method generates synthetic time series via random combinations of Gaussian process kernels. By chaining simpler kernels representing trends, seasonality and noises in a probabilistic manner, KernelSynth can yield realistic synthetic series.
The final kernel κ̃(t, t’) is constructed by sampling j kernels from a kernel bank K and combining them via random binary operations:
κ̃(t, t’) = κ₁(t, t’) ★₁ ⋯ ★ⱼ₋₁ κⱼ(t, t’)
where ★ᵢ ∈ {+, ×} are the binary operators. A synthetic time series is then generated by sampling from the Gaussian process prior 𝒢𝒫(0, κ̃(t, t’)).
Adding these augmentations, especially KernelSynth, to the pretraining data substantially improved Chronos’ performance in the experiments.
Some Experimental Results
The researchers conducted extensive experiments on 42 datasets from diverse domains and frequencies. They compare Chronos against a wide range of baselines including classical statistical models, as well as state-of-the-art deep learning models specifically designed and tuned for time series forecasting.
On 15 datasets used in pretraining, finetuning a pretrained Chronos model on each dataset outperforms all baselines, both in terms of point and probabilistic forecast accuracy. More impressively, on 27 new datasets not seen during pretraining, zero-shot forecasts from Chronos are competitive with dataset-specific models, outperforming most baselines. This indicates strong transfer learning capabilities.
The researchers also conduct extensive ablations and qualitative analyses. Larger Chronos models generally perform better but have slower inference. No temporal modification to the architecture was found to be necessary. The token quantization can occasionally fail on very sparse or noisy series. Overall, results suggest pretraining on large and diverse time series data is effective for learning general patterns.
Code:
I still need to play on it a bit more to write my views, though i am putting code what is shared on their Github Repository
Architecture
The models in this repository are based on the T5 architecture. The only difference is in the vocabulary size: Chronos-T5 models use 4096 different tokens, compared to 32128 of the original T5 models, resulting in fewer parameters.
ModelParametersBased on
chronos-t5-tiny 8M based on t5-efficient-tiny
chronos-t5-mini 20M based on t5-efficient-mini
chronos-t5-small 46M based on t5-efficient-small
chronos-t5-base 200M based on t5-efficient-base
chronos-t5-large 710M based on t5-efficient-large
Usage
To perform inference with Chronos models, install this package by running:
pip install git+https://github.com/amazon-science/chronos-forecasting.git
A minimal example showing how to perform inference using Chronos models:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import torch
from chronos import ChronosPipeline
pipeline = ChronosPipeline.from_pretrained(
"amazon/chronos-t5-small",
device_map="cuda",
torch_dtype=torch.bfloat16,
)
df = pd.read_csv("https://raw.githubusercontent.com/AileenNielsen/TimeSeriesAnalysisWithPython/master/data/AirPassengers.csv")
# context must be either a 1D tensor, a list of 1D tensors,
# or a left-padded 2D tensor with batch as the first dimension
context = torch.tensor(df["#Passengers"])
prediction_length = 12
forecast = pipeline.predict(context, prediction_length) # shape [num_series, num_samples, prediction_length]
# visualize the forecast
forecast_index = range(len(df), len(df) + prediction_length)
low, median, high = np.quantile(forecast[0].numpy(), [0.1, 0.5, 0.9], axis=0)
plt.figure(figsize=(8, 4))
plt.plot(df["#Passengers"], color="royalblue", label="historical data")
plt.plot(forecast_index, median, color="tomato", label="median forecast")
plt.fill_between(forecast_index, low, high, color="tomato", alpha=0.3, label="80% prediction interval")
plt.legend()
plt.grid()
plt.show()
Looking Forward
The Chronos framework presents an exciting new direction for time series forecasting by repurposing pretrained language models. It achieves promising results with a remarkably simple approach, relying mainly on the power of pretraining and data scale.
However, challenges remain, especially in expanding high-quality time series datasets. Incorporating additional inputs like covariates also merits further study. Improving the efficiency of the token quantization and inference speed will help in real-world deployment.
Chronos also creates opportunities to leverage advances in language modeling, such as longer-context models, retrieval augmentation, and efficient finetuning. Combining Chronos with time series-specific techniques like temporal convolutions is another promising avenue.
Conclusion
The success of Chronos and foundation models in other domains suggests that pretrained neural networks can excel as generalist forecasters and simplify pipelines compared to dataset-specific approaches. With further research and refinement, frameworks like Chronos may revolutionize practical time series modeling. Progress will likely emerge from jointly scaling time series data collection, model capacity, and empirical iteration to meet the unique challenges of the time series domain.