ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents
Zechen Li, Baiyu Chen, Hao Xue, Flora D. Salim
2025-08-20
Summary
This paper introduces ZARA, a new system for recognizing human activities using motion sensor data, without needing to retrain it for new activities or sensors. It's the first system to do this in a way that's easy to understand and explain.
What's the problem?
Current methods for understanding human activities from motion sensor data are rigid. They need to be retrained whenever new activities or different sensor setups are introduced, which is time-consuming and expensive. Also, recent attempts using AI models like large language models have had trouble being accurate and explaining their reasoning.
What's the solution?
ZARA is a framework that uses multiple interacting AI agents. It builds a knowledge base of how different activities can be distinguished from each other, uses a special module to find relevant sensor data as evidence, and then guides a large language model through a step-by-step process. This process involves the LLM picking out key features, using the evidence found, and then predicting the activity and explaining how it arrived at that conclusion in plain English.
Why it matters?
This research is important because it allows for flexible and understandable human activity recognition without the need for constant retraining or complicated custom classifiers. This makes it easier to use motion sensor data for various applications like health monitoring or smart devices, and it builds trust by providing clear explanations for its predictions, making it a significant step towards more reliable and user-friendly motion analysis.
Abstract
Motion sensor time-series are central to human activity recognition (HAR), with applications in health, sports, and smart devices. However, existing methods are trained for fixed activity sets and require costly retraining when new behaviours or sensor setups appear. Recent attempts to use large language models (LLMs) for HAR, typically by converting signals into text or images, suffer from limited accuracy and lack verifiable interpretability. We propose ZARA, the first agent-based framework for zero-shot, explainable HAR directly from raw motion time-series. ZARA integrates an automatically derived pair-wise feature knowledge base that captures discriminative statistics for every activity pair, a multi-sensor retrieval module that surfaces relevant evidence, and a hierarchical agent pipeline that guides the LLM to iteratively select features, draw on this evidence, and produce both activity predictions and natural-language explanations. ZARA enables flexible and interpretable HAR without any fine-tuning or task-specific classifiers. Extensive experiments on 8 HAR benchmarks show that ZARA achieves SOTA zero-shot performance, delivering clear reasoning while exceeding the strongest baselines by 2.53x in macro F1. Ablation studies further confirm the necessity of each module, marking ZARA as a promising step toward trustworthy, plug-and-play motion time-series analysis. Our codes are available at https://github.com/zechenli03/ZARA.