WebLeaper: Empowering Efficiency and Efficacy in WebAgent via Enabling Info-Rich Seeking

Zhengwei Tao, Haiyang Shen, Baixuan Li, Wenbiao Yin, Jialong Wu, Kuan Li, Zhongwang Zhang, Huifeng Yin, Rui Ye, Liwen Zhang, Xinyu Wang, Pengjun Xie, Jingren Zhou, Yong Jiang

2025-10-29

WebLeaper: Empowering Efficiency and Efficacy in WebAgent via Enabling Info-Rich Seeking

Summary

This paper focuses on improving how AI agents, specifically those powered by large language models, find information online to solve problems. These agents need to be able to search effectively, but current systems often struggle with this, taking too many steps to find what they need.

What's the problem?

The main issue is that AI agents aren't very efficient at searching for information. They often look at a lot of irrelevant stuff before finding the answer. This happens because the training data used to teach these agents doesn't have enough examples of the specific things they need to find, making it hard for them to learn good search strategies. Essentially, they haven't 'seen' enough of what they're looking for during training.

What's the solution?

The researchers created a new system called WebLeaper. It works by building more comprehensive training tasks that include a wider variety of things the agent might need to search for. They designed these tasks like a branching tree, allowing for more possibilities. They also used Wikipedia tables to create different types of search challenges – Basic, Union, and Reverse-Union – to gradually improve the agent’s search skills. Finally, they only used successful *and* efficient search paths to train the agent, so it learns to be both correct and quick.

Why it matters?

This research is important because it makes AI agents more capable of solving complex problems that require online research. By improving search efficiency, these agents can work faster and more effectively, which is crucial for real-world applications like answering questions, completing tasks, and making decisions. It’s a step towards creating AI that can truly think and act independently.

Abstract

Large Language Model (LLM)-based agents have emerged as a transformative approach for open-ended problem solving, with information seeking (IS) being a core capability that enables autonomous reasoning and decision-making. While prior research has largely focused on improving retrieval depth, we observe that current IS agents often suffer from low search efficiency, which in turn constrains overall performance. A key factor underlying this inefficiency is the sparsity of target entities in training tasks, which limits opportunities for agents to learn and generalize efficient search behaviors. To address these challenges, we propose WebLeaper, a framework for constructing high-coverage IS tasks and generating efficient solution trajectories. We formulate IS as a tree-structured reasoning problem, enabling a substantially larger set of target entities to be embedded within a constrained context. Leveraging curated Wikipedia tables, we propose three variants for synthesizing IS tasks, Basic, Union, and Reverse-Union, to systematically increase both IS efficiency and efficacy. Finally, we curate training trajectories by retaining only those that are simultaneously accurate and efficient, ensuring that the model is optimized for both correctness and search performance. Extensive experiments on both basic and comprehensive settings, conducted on five IS benchmarks, BrowserComp, GAIA, xbench-DeepSearch, WideSearch, and Seal-0, demonstrate that our method consistently achieves improvements in both effectiveness and efficiency over strong baselines.

View Paper