OpenTSLM: Time-Series Language Models for Reasoning over Multivariate Medical Text- and Time-Series Data

Patrick Langer, Thomas Kaar, Max Rosenblattl, Maxwell A. Xu, Winnie Chow, Martin Maritsch, Aradhana Verma, Brian Han, Daniel Seung Kim, Henry Chubb, Scott Ceresnak, Aydin Zahedivash, Alexander Tarlochan Singh Sandhu, Fatima Rodriguez, Daniel McDuff, Elgar Fleisch, Oliver Aalami, Filipe Barata, Paul Schmiedmayer

2025-10-06

OpenTSLM: Time-Series Language Models for Reasoning over Multivariate Medical Text- and Time-Series Data

Summary

This paper introduces OpenTSLM, a new type of language model designed to understand and reason about time series data, like medical readings that change over time, alongside regular text.

What's the problem?

Large language models are really good at understanding text and even images, but they struggle with time series data – information that’s collected over time, like heartbeats or sleep patterns. Existing methods of feeding this data into LLMs either aren’t very effective or require a huge amount of computer memory, especially when dealing with long sequences of data.

What's the solution?

The researchers created OpenTSLM, which comes in two versions. One, called OpenTSLM-SoftPrompt, adds special 'time series tokens' to the text the model processes. The other, OpenTSLM-Flamingo, directly connects the time series data to the text using a technique called cross-attention. They tested these models on three new datasets related to human activity recognition, sleep staging, and ECG analysis, comparing them to other approaches. They also made all their code and data publicly available.

Why it matters?

This work is important because it allows language models to better understand and interpret medical data that changes over time. OpenTSLM performs better than existing methods, even surpassing more powerful models like GPT-4o in some cases, and does so without requiring as much computer memory. This could lead to better digital health applications and more accurate medical insights.

Abstract

LLMs have emerged as powerful tools for interpreting multimodal data. In medicine, they hold particular promise for synthesizing large volumes of clinical information into actionable insights and digital health applications. Yet, a major limitation remains their inability to handle time series. To overcome this gap, we present OpenTSLM, a family of Time Series Language Models (TSLMs) created by integrating time series as a native modality to pretrained LLMs, enabling reasoning over multiple time series of any length. We investigate two architectures for OpenTSLM. The first, OpenTSLM-SoftPrompt, models time series implicitly by concatenating learnable time series tokens with text tokens via soft prompting. Although parameter-efficient, we hypothesize that explicit time series modeling scales better and outperforms implicit approaches. We thus introduce OpenTSLM-Flamingo, which integrates time series with text via cross-attention. We benchmark both variants against baselines that treat time series as text tokens or plots, across a suite of text-time-series Chain-of-Thought (CoT) reasoning tasks. We introduce three datasets: HAR-CoT, Sleep-CoT, and ECG-QA-CoT. Across all, OpenTSLM models outperform baselines, reaching 69.9 F1 in sleep staging and 65.4 in HAR, compared to 9.05 and 52.2 for finetuned text-only models. Notably, even 1B-parameter OpenTSLM models surpass GPT-4o (15.47 and 2.95). OpenTSLM-Flamingo matches OpenTSLM-SoftPrompt in performance and outperforms on longer sequences, while maintaining stable memory requirements. By contrast, SoftPrompt grows exponentially in memory with sequence length, requiring around 110 GB compared to 40 GB VRAM when training on ECG-QA with LLaMA-3B. Expert reviews by clinicians find strong reasoning capabilities exhibited by OpenTSLMs on ECG-QA. To facilitate further research, we provide all code, datasets, and models open-source.

View Paper