Toto: Time Series Optimized Transformer for Observability
Ben Cohen, Emaad Khwaja, Kan Wang, Charles Masson, Elise Ramé, Youssef Doubli, Othmane Abou-Amal
2024-07-15

Summary
This paper presents Toto, a new advanced model for predicting time series data, particularly focusing on observability metrics, developed by Datadog.
What's the problem?
Time series data, which tracks how things change over time (like weather patterns or electricity usage), can be difficult for existing models to analyze accurately. Many current models struggle with the complexity of this data, especially when it comes to making accurate predictions in real-time monitoring of systems.
What's the solution?
Toto addresses these challenges by using a large dataset of one trillion time series data points, which is the largest used for training any time series model so far. It includes a significant amount of proprietary data from Datadog, ensuring that it is well-tuned to handle the specific characteristics of observability metrics. Toto employs advanced techniques like proportional factorized attention and a Student-T mixture model to improve its forecasting abilities. This allows it to perform exceptionally well on various tasks, surpassing previous models in accuracy and efficiency.
Why it matters?
This research is important because it significantly enhances the ability to predict and monitor complex systems in real-time. By improving how we analyze time series data, Toto can help businesses and organizations better understand their operations, leading to quicker responses to issues and improved overall performance.
Abstract
This technical report describes the Time Series Optimized Transformer for Observability (Toto), a new state of the art foundation model for time series forecasting developed by Datadog. In addition to advancing the state of the art on generalized time series benchmarks in domains such as electricity and weather, this model is the first general-purpose time series forecasting foundation model to be specifically tuned for observability metrics. Toto was trained on a dataset of one trillion time series data points, the largest among all currently published time series foundation models. Alongside publicly available time series datasets, 75% of the data used to train Toto consists of fully anonymous numerical metric data points from the Datadog platform. In our experiments, Toto outperforms existing time series foundation models on observability data. It does this while also excelling at general-purpose forecasting tasks, achieving state-of-the-art zero-shot performance on multiple open benchmark datasets.