WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents

Junteng Liu, Yunji Li, Chi Zhang, Jingyang Li, Aili Chen, Ke Ji, Weiyu Cheng, Zijia Wu, Chengyu Du, Qidi Xu, Jiayuan Song, Zhengmao Zhu, Wenhu Chen, Pengyu Zhao, Junxian He

2025-09-09

WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents

Summary

This paper focuses on building better AI agents that can use the internet to find information and solve complex problems, like answering tricky questions that require searching multiple websites.

What's the problem?

Current AI agents that can browse the web either aren't very good at finding the right information when tasks are complicated, or their inner workings aren't clear and understandable. A big part of the issue is that there isn't enough good training data specifically designed to challenge these agents and teach them how to effectively search and reason through web content.

What's the solution?

The researchers created a new method called WebExplorer to generate a high-quality dataset for training web agents. They used a process where the AI explores information, then refines its search queries step-by-step, going from broad questions to very specific ones. This created challenging question-and-answer pairs that require the AI to think through multiple steps and navigate websites effectively. They then used this data to train a new agent, WebExplorer-8B, using a combination of supervised learning and reinforcement learning, allowing it to handle long and complex searches.

Why it matters?

This work is important because it shows a practical way to build more capable web agents that can tackle long-term, complex tasks. WebExplorer-8B, even though it's a relatively small model, performs better than much larger models on several benchmarks, demonstrating that good training data is key. This could lead to AI systems that are much better at assisting us with research, problem-solving, and accessing information online.

Abstract

The paradigm of Large Language Models (LLMs) has increasingly shifted toward agentic applications, where web browsing capabilities are fundamental for retrieving information from diverse online sources. However, existing open-source web agents either demonstrate limited information-seeking abilities on complex tasks or lack transparent implementations. In this work, we identify that the key challenge lies in the scarcity of challenging data for information seeking. To address this limitation, we introduce WebExplorer: a systematic data generation approach using model-based exploration and iterative, long-to-short query evolution. This method creates challenging query-answer pairs that require multi-step reasoning and complex web navigation. By leveraging our curated high-quality dataset, we successfully develop advanced web agent WebExplorer-8B through supervised fine-tuning followed by reinforcement learning. Our model supports 128K context length and up to 100 tool calling turns, enabling long-horizon problem solving. Across diverse information-seeking benchmarks, WebExplorer-8B achieves the state-of-the-art performance at its scale. Notably, as an 8B-sized model, WebExplorer-8B is able to effectively search over an average of 16 turns after RL training, achieving higher accuracy than WebSailor-72B on BrowseComp-en/zh and attaining the best performance among models up to 100B parameters on WebWalkerQA and FRAMES. Beyond these information-seeking tasks, our model also achieves strong generalization on the HLE benchmark even though it is only trained on knowledge-intensive QA data. These results highlight our approach as a practical path toward long-horizon web agents.

View Paper