Nested Browser-Use Learning for Agentic Information Seeking

Baixuan Li, Jialong Wu, Wenbiao Yin, Kuan Li, Zhongwang Zhang, Huifeng Yin, Zhengwei Tao, Liwen Zhang, Pengjun Xie, Jingren Zhou, Yong Jiang

2025-12-30

Nested Browser-Use Learning for Agentic Information Seeking

Summary

This paper introduces a new way for AI agents to interact with the internet, allowing them to go beyond simply searching for snippets of information and actually 'browse' websites like a human would.

What's the problem?

Current AI agents that try to find information online are limited because they mostly just grab small pieces of text or links to webpages. They can't really *use* a browser to explore websites fully, which means they miss out on a lot of useful information hidden deeper within websites. Giving them full browser control is really complicated because websites are messy and require a lot of specific actions, making it hard for the AI to figure out what to do.

What's the solution?

The researchers developed a system called NestBrowse. It's a clever way to let AI agents use a browser without getting bogged down in all the details. It works by creating a layered structure – think of it like nested boxes – where the AI controls the overall browsing strategy, but doesn't need to worry about every single click or piece of code on the page. This makes it easier for the AI to reason about what it's doing and find the information it needs.

Why it matters?

This is important because it allows AI agents to access and understand information that was previously unavailable to them. This could lead to much more powerful and helpful AI systems that can perform complex tasks online, like researching topics in depth or completing tasks that require interacting with websites.

Abstract

Information-seeking (IS) agents have achieved strong performance across a range of wide and deep search tasks, yet their tool use remains largely restricted to API-level snippet retrieval and URL-based page fetching, limiting access to the richer information available through real browsing. While full browser interaction could unlock deeper capabilities, its fine-grained control and verbose page content returns introduce substantial complexity for ReAct-style function-calling agents. To bridge this gap, we propose Nested Browser-Use Learning (NestBrowse), which introduces a minimal and complete browser-action framework that decouples interaction control from page exploration through a nested structure. This design simplifies agentic reasoning while enabling effective deep-web information acquisition. Empirical results on challenging deep IS benchmarks demonstrate that NestBrowse offers clear benefits in practice. Further in-depth analyses underscore its efficiency and flexibility.

View Paper