Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward

Yong Deng, Guoqing Wang, Zhenzhe Ying, Xiaofeng Wu, Jinzhen Lin, Wenwen Xiong, Yuqin Dai, Shuo Yang, Zhanwei Zhang, Qiwen Wang, Yang Qin, Changhua Meng

2025-08-20

Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward

Summary

This paper introduces a new way for AI models, called LLMs, to tackle complex problems by breaking down their thinking process into smaller, manageable steps and using a special reward system to guide them. It also presents a method to train these AI agents more effectively by adjusting the rewards they receive over time.

What's the problem?

Large language models are good at solving problems but get stuck on really tough ones because their knowledge is fixed. While there are ways to give them access to outside information, they still have trouble with tasks that require multiple steps of reasoning or smart searching due to their rigid way of working. Existing methods for training these AI agents to think for themselves struggle with confusing signals and not getting enough feedback, which slows down their progress and makes training inefficient.

What's the solution?

The researchers developed a new thinking method called Atomic Thought, which divides complex reasoning into tiny functional parts. These parts are then guided by special 'Reasoning Reward Models' that provide feedback on each small step. They also created a new training approach called Atom-Searcher that combines Atomic Thought with this feedback system. Atom-Searcher uses a smart way of giving rewards, starting with feedback on the thinking process itself and then moving towards rewards for the final outcome, which helps the AI learn effective ways of thinking much faster.

Why it matters?

This work matters because it makes AI agents better at complex problem-solving by improving how they think and learn. It also allows for more computational power to be used when the AI is actually solving a problem, provides a clear way to train these AI models, and leads to AI that reasons in a way that's easier to understand and more like how humans do it.

Abstract

Large language models (LLMs) exhibit remarkable problem-solving abilities, but struggle with complex tasks due to static internal knowledge. Retrieval-Augmented Generation (RAG) enhances access to external information, yet remains limited in multi-hop reasoning and strategic search due to rigid workflows. Recent advancements in agentic deep research empower LLMs to autonomously reason, search, and synthesize information. However, current approaches relying on outcome-based reinforcement learning (RL) face critical issues such as conflicting gradients and reward sparsity, limiting performance gains and training efficiency. To address these, we first propose Atomic Thought, a novel LLM thinking paradigm that decomposes reasoning into fine-grained functional units. These units are supervised by Reasoning Reward Models (RRMs), which provide Atomic Thought Rewards (ATR) for fine-grained guidance. Building on this, we propose Atom-Searcher, a novel RL framework for agentic deep research that integrates Atomic Thought and ATR. Atom-Searcher uses a curriculum-inspired reward schedule, prioritizing process-level ATR early and transitioning to outcome rewards, accelerating convergence on effective reasoning paths. Experiments on seven benchmarks show consistent improvements over the state-of-the-art. Key advantages include: (1) Atom-Searcher scales computation at test-time. (2) Atomic Thought provides supervision anchors for RRMs, bridging deep research tasks and RRMs. (3) Atom-Searcher exhibits more interpretable, human-like reasoning patterns.

View Paper