LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to Semantics
Yueyang Ding, HaoPeng Zhang, Rui Dai, Yi Wang, Tianyu Zong, Kaikui Liu, Xiangxiang Chu
2026-04-24
Summary
This paper focuses on the difficulty Large Language Models (LLMs) have when dealing with time series data – data that changes over time, like stock prices or weather patterns. The researchers created a new way to test these models and built a new model, LLaTiSA, that performs better at understanding this type of data.
What's the problem?
Currently, it's hard to accurately judge how well LLMs understand time series because the tests used aren't clearly defined and often have multiple possible answers. This makes it difficult to build models that can reliably reason about data that changes over time. Essentially, there wasn't a standardized, challenging benchmark to push the development of these models.
What's the solution?
The researchers first created a detailed system for categorizing the different kinds of thinking needed to understand time series data, ranging from simple to complex. Then, they built a large dataset called HiTSR with over 83,000 examples covering all these levels of complexity. Finally, they developed LLaTiSA, a model that combines visual representations of patterns with precise numerical data to help it better understand time. They trained LLaTiSA in stages, starting with easier tasks and gradually increasing the difficulty.
Why it matters?
This work is important because it provides a clear framework and a challenging dataset for evaluating and improving LLMs' ability to understand time series data. This is crucial for applications like predicting trends, forecasting future events, and making informed decisions based on data that changes over time, impacting fields like finance, healthcare, and environmental science.
Abstract
Comprehensive understanding of time series remains a significant challenge for Large Language Models (LLMs). Current research is hindered by fragmented task definitions and benchmarks with inherent ambiguities, precluding rigorous evaluation and the development of unified Time Series Reasoning Models(TSRMs). To bridge this gap, we formalize Time Series Reasoning (TSR) via a four-level taxonomy of increasing cognitive complexity. We introduce HiTSR, a hierarchical time series reasoning dataset comprising 83k samples with diverse task combinations and verified Chain-of-Thought (CoT) trajectories. Leveraging HiTSR, we propose LLaTiSA, a strong TSRM that integrates visualized patterns with precision-calibrated numerical tables to enhance the temporal perception of Vision-Language Models (VLMs). Through a multi-stage curriculum fine-tuning strategy, LLaTiSA achieves superior performance and exhibits robust out-of-distribution generalization across diverse TSR tasks and real-world scenarios. Our code is available at https://github.com/RainingNovember/LLaTiSA.