ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization
Xixi Wu, Kuan Li, Yida Zhao, Liwen Zhang, Litu Ou, Huifeng Yin, Zhongwang Zhang, Yong Jiang, Pengjun Xie, Fei Huang, Minhao Cheng, Shuai Wang, Hong Cheng, Jingren Zhou
2025-09-17
Summary
This paper introduces a new way for AI agents that use large language models to explore the web and solve complex problems, overcoming a key limitation of current methods.
What's the problem?
AI agents powered by large language models are really good at tasks that require looking up information online, but they have trouble with complicated questions that need a lot of searching and remembering what they've already found. These agents work by having a 'conversation' with the web, but this conversation gets too long to fit within the model's memory, stopping them from finding complete answers. Essentially, they 'forget' important details as they go.
What's the solution?
The researchers developed a system called ReSum that solves this by periodically summarizing the agent's past interactions. Instead of trying to remember every single step, the agent creates a condensed 'reasoning state' that captures the important information. This allows the agent to continue exploring without running out of memory. They also created a training method, ReSum-GRPO, to help the agent learn how to best use these summaries to make decisions.
Why it matters?
This work is important because it allows AI agents to tackle much more complex web-based tasks than before. By overcoming the memory limitations, these agents can handle questions that require extensive research and reasoning, leading to more accurate and complete solutions. The results show significant improvements over existing methods, even with a relatively small amount of training data, and demonstrates a path towards more capable and helpful AI assistants.
Abstract
Large Language Model (LLM)-based web agents demonstrate strong performance on knowledge-intensive tasks but are hindered by context window limitations in paradigms like ReAct. Complex queries involving multiple entities, intertwined relationships, and high uncertainty demand extensive search cycles that rapidly exhaust context budgets before reaching complete solutions. To overcome this challenge, we introduce ReSum, a novel paradigm that enables indefinite exploration through periodic context summarization. ReSum converts growing interaction histories into compact reasoning states, maintaining awareness of prior discoveries while bypassing context constraints. For paradigm adaptation, we propose ReSum-GRPO, integrating GRPO with segmented trajectory training and advantage broadcasting to familiarize agents with summary-conditioned reasoning. Extensive experiments on web agents of varying scales across three benchmarks demonstrate that ReSum delivers an average absolute improvement of 4.5\% over ReAct, with further gains of up to 8.2\% following ReSum-GRPO training. Notably, with only 1K training samples, our WebResummer-30B (a ReSum-GRPO-trained version of WebSailor-30B) achieves 33.3\% Pass@1 on BrowseComp-zh and 18.3\% on BrowseComp-en, surpassing existing open-source web agents.