WideSearch: Benchmarking Agentic Broad Info-Seeking
Ryan Wong, Jiawei Wang, Junjie Zhao, Li Chen, Yan Gao, Long Zhang, Xuan Zhou, Zuo Wang, Kai Xiang, Ge Zhang, Wenhao Huang, Yang Wang, Ke Wang
2025-08-12
Summary
This paper talks about WideSearch, a new benchmark created to test how well automated search agents can gather large amounts of detailed information reliably. It checks whether these AI systems can collect, verify, and organize lots of small facts across many topics from the internet.
What's the problem?
The problem is that many tasks require searching through a huge amount of information, which is repetitive and time-consuming for humans. Although AI search agents are being developed to help with this, there was no good way to measure how well they perform on these big, broad searches. Current systems struggle a lot with completing these wide-scale tasks correctly and completely.
What's the solution?
To solve this, the researchers built WideSearch with 200 carefully chosen questions from more than 15 different fields that need large-scale information collection. They created a strict process to make sure these questions are challenging, verifiable, and require active searching beyond what AI already 'knows'. Then, they tested over 10 modern AI search agents, including different kinds of systems, to see how well they do. The results showed that most agents barely succeed, with the best only reaching about 5% success. This proves that there are big challenges that need more work.
Why it matters?
This matters because understanding how well AI agents can gather and organize information at a large scale helps guide future research to make these agents better. Improving them could save people a lot of time and effort in searching for information, making automated tools more useful for professional research, planning, and everyday tasks.
Abstract
WideSearch is a new benchmark evaluating the reliability of automated search agents in large-scale information collection tasks, revealing significant deficiencies in current systems.