Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-session Agents
Yiming Du, Baojun Wang, Yifan Xiang, Zhaowei Wang, Wenyu Huang, Boyang Xue, Bin Liang, Xingshan Zeng, Fei Mi, Haoli Bai, Lifeng Shang, Jeff Z. Pan, Yuxin Jiang, Kam-Fai Wong
2025-12-24
Summary
This paper introduces a new system, Memory-T1, designed to help computer programs understand conversations that span a long time and involve many back-and-forths, specifically when answering questions about what was said earlier.
What's the problem?
When computers try to understand long conversations, they often struggle to remember and correctly use information from the past, especially as the conversation gets longer and more cluttered with irrelevant details. Existing computer models lose accuracy when trying to figure out *when* something was said, which is crucial for answering questions like 'What did I say yesterday?' or 'What happened before this event?'
What's the solution?
The researchers created Memory-T1, which works in two steps. First, it quickly narrows down the conversation to the most potentially relevant parts based on time and how closely the content relates to the question. Then, it uses a 'learning by trial and error' method, called reinforcement learning, to pinpoint the *exact* parts of the conversation that are most helpful. This learning process is guided by rewards for getting the right answer, using the correct evidence, and making sure the timing of the information makes sense. Importantly, the system is specifically rewarded for understanding both the general order of events and the precise timing of individual statements.
Why it matters?
Memory-T1 significantly improves the ability of computer programs to understand long conversations, achieving a new top performance level for openly available models. It's better at handling noisy and lengthy conversations than previous methods, meaning it can maintain accuracy even when there's a lot of extra, unimportant information. This is a big step towards creating more helpful and reliable conversational AI assistants.
Abstract
Temporal reasoning over long, multi-session dialogues is a critical capability for conversational agents. However, existing works and our pilot study have shown that as dialogue histories grow in length and accumulate noise, current long-context models struggle to accurately identify temporally pertinent information, significantly impairing reasoning performance. To address this, we introduce Memory-T1, a framework that learns a time-aware memory selection policy using reinforcement learning (RL). It employs a coarse-to-fine strategy, first pruning the dialogue history into a candidate set using temporal and relevance filters, followed by an RL agent that selects the precise evidence sessions. The RL training is guided by a multi-level reward function optimizing (i) answer accuracy, (ii) evidence grounding, and (iii) temporal consistency. In particular, the temporal consistency reward provides a dense signal by evaluating alignment with the query time scope at both the session-level (chronological proximity) and the utterance-level (chronological fidelity), enabling the agent to resolve subtle chronological ambiguities. On the Time-Dialog benchmark, Memory-T1 boosts a 7B model to an overall score of 67.0\%, establishing a new state-of-the-art performance for open-source models and outperforming a 14B baseline by 10.2\%. Ablation studies show temporal consistency and evidence grounding rewards jointly contribute to a 15.0\% performance gain. Moreover, Memory-T1 maintains robustness up to 128k tokens, where baseline models collapse, proving effectiveness against noise in extensive dialogue histories. The code and datasets are publicly available at https://github.com/Elvin-Yiming-Du/Memory-T1/