Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks

Yuxiang Zhang, Jiangming Shu, Ye Ma, Xueyuan Lin, Shangxi Wu, Jitao Sang

2025-10-15

Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks

Summary

This paper explores how to make AI agents, powered by large language models, better at complex tasks that require remembering information over a long period of time.

What's the problem?

Large language models struggle with tasks that need them to keep track of lots of information because their 'memory' gets overloaded with details that aren't important. Current methods for helping them remember things usually add extra steps that aren't really part of how the AI makes decisions, and they don't adapt well to the specific task. Essentially, the AI doesn't learn *how* to manage its own memory effectively.

What's the solution?

The researchers came up with a new approach called 'Memory-as-Action,' where the AI actively decides what information to keep and what to discard as part of its normal decision-making process. It's like the AI is editing its own notes while trying to solve a problem. However, this 'editing' breaks the usual way AI learns from its experiences, so they also developed a new learning algorithm, 'Dynamic Context Policy Optimization,' to fix this and allow the AI to learn effectively even while changing its memory.

Why it matters?

This work is important because it allows AI agents to use their resources more efficiently and perform better on complex tasks. By letting the AI learn to manage its own memory, it can focus on the most relevant information and avoid getting distracted, leading to smarter and more capable AI systems.

Abstract

Large Language Models face challenges in long-horizon agentic tasks as their constrained memory is easily overwhelmed by distracting or irrelevant context. Existing working memory methods typically rely on external, heuristic mechanisms that are decoupled from the agent's core policy. In this work, we reframe working memory management as a learnable, intrinsic capability. We propose a novel framework, Memory-as-Action, where an agent actively manages its working memory by executing explicit editing operations as part of a unified policy. This formulation allows an agent, trained via reinforcement learning, to balance memory curation against long-term task objectives under given resource constraints. However, such memory editing actions break the standard assumption of a continuously growing prefix in LLM interactions, leading to what we call trajectory fractures. These non-prefix changes disrupt the causal continuity required by standard policy gradient methods, making those methods inapplicable. To address this, we propose a new algorithm, Dynamic Context Policy Optimization, which enables stable end-to-end reinforcement learning by segmenting trajectories at memory action points and applying trajectory-level advantages to the resulting action segments. Our results demonstrate that jointly optimizing for task reasoning and memory management in an end-to-end fashion not only reduces overall computational consumption but also improves task performance, driven by adaptive context curation strategies tailored to the model's intrinsic capabilities.

View Paper