Scaling Long-Horizon LLM Agent via Context-Folding
Weiwei Sun, Miao Lu, Zhan Ling, Kang Liu, Xuesong Yao, Yiming Yang, Jiecao Chen
2025-10-15
Summary
This paper introduces a new method called Context-Folding that helps AI agents, specifically those powered by large language models, work on tasks that require many steps without getting bogged down by memory limitations.
What's the problem?
Large language models are really good at many things, but they struggle with tasks that take a long time to complete because they can only 'remember' a limited amount of information at once. This 'memory' is called context length, and when a task goes on for too long, the AI forgets what happened earlier, making it hard to finish successfully. Existing methods like simply summarizing information don't fully solve this problem.
What's the solution?
The researchers developed Context-Folding, which allows the AI agent to break down a big task into smaller, manageable sub-tasks. When a sub-task is finished, instead of keeping all the details of *how* it was done, the agent creates a short summary of the *result*. This summary is then added to its 'memory', freeing up space for new information. They also created a way to train the AI to do this effectively using a reinforcement learning approach called FoldGRPO, rewarding it for good task breakdown and efficient memory management.
Why it matters?
This is important because it allows AI agents to tackle much more complex and lengthy tasks than before. By using Context-Folding, the AI can perform just as well, or even better, than existing methods while using significantly less 'memory', making it more efficient and capable of handling real-world problems like in-depth research or software development.
Abstract
Large language model (LLM) agents are fundamentally constrained by context length on long-horizon tasks. We introduce Context-Folding, a framework that empowers agents to actively manage their working context. An agent can procedurally branch into a sub-trajectory to handle a subtask and then fold it upon completion, collapsing the intermediate steps while retaining a concise summary of the outcome. To make this behavior learnable, we develop an end-to-end reinforcement learning framework FoldGRPO with specific process rewards to encourage effective task decomposition and context management. On complex long-horizon tasks (Deep Research and SWE), our folding agent matches or outperforms the ReAct baselines while using an active context 10times smaller and significantly outperforms models that rely on summarization-based context management.