AgentFold: Long-Horizon Web Agents with Proactive Context Management
Rui Ye, Zhongwang Zhang, Kuan Li, Huifeng Yin, Zhengwei Tao, Yida Zhao, Liangcai Su, Liwen Zhang, Zile Qiao, Xinyu Wang, Pengjun Xie, Fei Huang, Siheng Chen, Jingren Zhou, Yong Jiang
2025-10-29
Summary
This paper introduces a new way to build web-browsing AI agents, called AgentFold, that are better at handling complex tasks requiring many steps and lots of information.
What's the problem?
Current AI agents that browse the web and try to complete tasks often struggle when those tasks are long and complicated. They either get bogged down by remembering *everything* they've done (leading to slow performance and irrelevant information), or they summarize too much and forget important details needed later on. It's a trade-off between keeping too much information and losing crucial context.
What's the solution?
AgentFold tackles this problem by mimicking how humans remember things. Instead of just logging everything or constantly summarizing, it actively *manages* its memory. It can 'fold' information – meaning it can either condense specific details to keep them handy, or completely abstract away entire sections of a task it's already finished. This allows it to focus on what's important without losing track of the bigger picture. The researchers trained this agent using a relatively simple method, without needing extensive pre-training or complex reinforcement learning.
Why it matters?
This is a big deal because AgentFold achieves impressive results, even outperforming much larger and more sophisticated AI agents, including some that are proprietary (like those from OpenAI). It shows that a smarter approach to memory management can make AI agents significantly more effective at tackling real-world, complex tasks on the web, without needing massive amounts of computing power.
Abstract
LLM-based web agents show immense promise for information seeking, yet their effectiveness on long-horizon tasks is hindered by a fundamental trade-off in context management. Prevailing ReAct-based agents suffer from context saturation as they accumulate noisy, raw histories, while methods that fixedly summarize the full history at each step risk the irreversible loss of critical details. Addressing these, we introduce AgentFold, a novel agent paradigm centered on proactive context management, inspired by the human cognitive process of retrospective consolidation. AgentFold treats its context as a dynamic cognitive workspace to be actively sculpted, rather than a passive log to be filled. At each step, it learns to execute a `folding' operation, which manages its historical trajectory at multiple scales: it can perform granular condensations to preserve vital, fine-grained details, or deep consolidations to abstract away entire multi-step sub-tasks. The results on prominent benchmarks are striking: with simple supervised fine-tuning (without continual pre-training or RL), our AgentFold-30B-A3B agent achieves 36.2% on BrowseComp and 47.3% on BrowseComp-ZH. Notably, this performance not only surpasses or matches open-source models of a dramatically larger scale, such as the DeepSeek-V3.1-671B-A37B, but also surpasses leading proprietary agents like OpenAI's o4-mini.