When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning

Leheng Sheng, Yongtao Zhang, Wenchang Ma, Yaorui Shi, Ting Huang, Xiang Wang, An Zhang, Ke Shen, Tat-Seng Chua

2026-02-12

When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning

Summary

This paper focuses on improving how large language models, or LLMs, handle very long pieces of text when trying to reason or answer questions about them.

What's the problem?

LLMs struggle when dealing with long texts because their performance gets worse as the text gets longer. A previous attempt to fix this, called MemAgent, processed the text in sections and updated a 'memory' to help with answering. However, MemAgent's memory could grow uncontrollably by updating even with unimportant information, and it didn't know when to stop processing, leading to wasted time and resources.

What's the solution?

The researchers introduced a new system called GRU-Mem. GRU-Mem uses 'gates' – think of them like controls – to manage the memory more effectively. One gate decides *when* to update the memory, only doing so when new information is actually useful. The other gate decides *when* to stop processing the text, ending the process as soon as enough evidence is gathered. They trained these gates using a reward system to encourage good behavior.

Why it matters?

This work is important because it makes LLMs much better at understanding and reasoning with long documents. GRU-Mem not only performs better than the previous method but also does so much faster, potentially speeding up inference by up to 400%, which is crucial for real-world applications that require processing large amounts of text.

Abstract

While reasoning over long context is crucial for various real-world applications, it remains challenging for large language models (LLMs) as they suffer from performance degradation as the context length grows. Recent work MemAgent has tried to tackle this by processing context chunk-by-chunk in an RNN-like loop and updating a textual memory for final answering. However, this naive recurrent memory update faces two crucial drawbacks: (i) memory can quickly explode because it can update indiscriminately, even on evidence-free chunks; and (ii) the loop lacks an exit mechanism, leading to unnecessary computation after even sufficient evidence is collected. To address these issues, we propose GRU-Mem, which incorporates two text-controlled gates for more stable and efficient long-context reasoning. Specifically, in GRU-Mem, the memory only updates when the update gate is open and the recurrent loop will exit immediately once the exit gate is open. To endow the model with such capabilities, we introduce two reward signals r^{update} and r^{exit} within end-to-end RL, rewarding the correct updating and exiting behaviors respectively. Experiments on various long-context reasoning tasks demonstrate the effectiveness and efficiency of GRU-Mem, which generally outperforms the vanilla MemAgent with up to 400\% times inference speed acceleration.

View Paper