BrowseComp-ZH: Benchmarking Web Browsing Ability of Large Language Models in Chinese

Peilin Zhou, Bruce Leon, Xiang Ying, Can Zhang, Yifan Shao, Qichen Ye, Dading Chong, Zhiling Jin, Chenxuan Xie, Meng Cao, Yuxin Gu, Sixin Hong, Jing Ren, Jian Chen, Chao Liu, Yining Hua

2025-05-09

BrowseComp-ZH: Benchmarking Web Browsing Ability of Large Language
Models in Chinese

Summary

This paper talks about BrowseComp-ZH, a new test designed to see how well large language models can browse and understand information from Chinese websites in real time.

What's the problem?

The problem is that most AI benchmarks for web browsing are focused on English, so it's unclear how well these models can handle Chinese websites, especially when it comes to finding and reasoning with information online.

What's the solution?

The researchers created BrowseComp-ZH, which specifically measures how good AI models are at searching, retrieving, and making sense of information from the Chinese web. This helps identify where the models struggle and what needs to be improved for Chinese users.

Why it matters?

This matters because the internet is used by people all over the world, and it's important for AI to work well in different languages. By focusing on Chinese web browsing, this research helps make AI more useful and fair for Chinese speakers, and it pushes technology to be more global.

Abstract

BrowseComp-ZH evaluates large language models on real-time Chinese web browsing tasks, highlighting challenges in retrieval and reasoning beyond existing English benchmarks.

View Paper