Insight Miner: A Time Series Analysis Dataset for Cross-Domain Alignment with Natural Language

Yunkai Zhang, Yawen Zhang, Ming Zheng, Kezhen Chen, Chongyang Gao, Ruian Ge, Siyuan Teng, Amine Jelloul, Jinmeng Rao, Xiaoyuan Guo, Chiang-Wei Fang, Zeyu Zheng, Jie Yang

2025-12-19

Insight Miner: A Time Series Analysis Dataset for Cross-Domain Alignment with Natural Language

Summary

This paper introduces a new system called Insight Miner, which is designed to automatically explain what's happening in time-series data – things like stock prices, weather patterns, or sensor readings – using plain language.

What's the problem?

Understanding time-series data usually requires a lot of specialized knowledge and takes a long time because experts need to carefully analyze the data and figure out what the trends mean. It's difficult to quickly get useful insights without a deep understanding of the specific field the data comes from.

What's the solution?

The researchers created a new dataset called TS-Insights, which pairs time-series data with descriptions of what's happening in those series. They then built Insight Miner, a model that learns to generate these descriptions automatically. They used a powerful language model (GPT-4) along with statistical tools to analyze the time series and create clear, understandable explanations. By training Insight Miner on TS-Insights, it became better at describing time-series data than other existing models.

Why it matters?

This work is important because it moves us closer to a future where computers can understand and explain time-series data without needing a human expert. This could be incredibly useful in many fields, allowing for faster and more efficient analysis of important trends and patterns, and ultimately making it easier for anyone to understand complex data.

Abstract

Time-series data is critical across many scientific and industrial domains, including environmental analysis, agriculture, transportation, and finance. However, mining insights from this data typically requires deep domain expertise, a process that is both time-consuming and labor-intensive. In this paper, we propose Insight Miner, a large-scale multimodal model (LMM) designed to generate high-quality, comprehensive time-series descriptions enriched with domain-specific knowledge. To facilitate this, we introduce TS-InsightsAvailable at \href{https://huggingface.co/datasets/zhykoties/time-series-language-alignment{https://huggingface.co/datasets/zhykoties/time-series-language-alignment}.}, the first general-domain dataset for time series and language alignment. TS-Insights contains 100k time-series windows sampled from 20 forecasting datasets. We construct this dataset using a novel agentic workflow, where we use statistical tools to extract features from raw time series before synthesizing them into coherent trend descriptions with GPT-4. Following instruction tuning on TS-Insights, Insight Miner outperforms state-of-the-art multimodal models, such as LLaVA liu2023llava and GPT-4, in generating time-series descriptions and insights. Our findings suggest a promising direction for leveraging LMMs in time series analysis, and serve as a foundational step toward enabling LLMs to interpret time series as a native input modality.

View Paper