Fathom-DeepResearch: Unlocking Long Horizon Information Retrieval and Synthesis for SLMs

Shreyas Singh, Kunal Singh, Pradeep Moturi

2025-10-08

Fathom-DeepResearch: Unlocking Long Horizon Information Retrieval and Synthesis for SLMs

Summary

This paper introduces Fathom, a new system designed to perform complex research tasks using the internet, and it's made up of two main parts: one that searches the web and one that writes up the findings.

What's the problem?

Current AI agents struggle with tasks that require extensive research and reasoning, especially when they need to use tools like web search to gather information. They often get stuck, don't use search effectively, or can't properly combine information from different sources to create a well-supported answer. Existing systems also have trouble knowing when to stop searching and start writing a report.

What's the solution?

The researchers created Fathom, which has two key components. First, Fathom-Search uses a model to intelligently search the web, asking specific questions and looking at relevant webpages. It was trained using a new dataset and a special training method that encourages it to rely on web searches and combine information from multiple sources. This training also helps it decide how far to go with its research. Second, Fathom-Synthesizer takes the search results and turns them into a detailed report with citations. Both parts are built on existing models but are improved with new techniques.

Why it matters?

This work is important because it pushes the boundaries of what AI agents can do. By creating a system that can perform in-depth research, the researchers are making progress towards AI that can help us with complex tasks that require gathering and synthesizing information from the real world. Fathom achieves top performance compared to other openly available systems and shows it can handle a variety of challenging reasoning problems.

Abstract

Tool-integrated reasoning has emerged as a key focus for enabling agentic applications. Among these, DeepResearch Agents have gained significant attention for their strong performance on complex, open-ended information-seeking tasks. We introduce Fathom-DeepResearch, an agentic system composed of two specialized models. The first is Fathom-Search-4B, a DeepSearch model trained from Qwen3-4B and optimized for evidence-based investigation through live web search and targeted webpage querying. Its training combines three advances: (i) DUETQA, a 5K-sample dataset generated via multi-agent self-play that enforces strict web-search dependence and heterogeneous source grounding; (ii) RAPO, a zero-overhead extension of GRPO that stabilizes multi-turn Reinforcement Learning with Verifiable Rewards through curriculum pruning, reward-aware advantage scaling, and per-prompt replay buffers; and (iii) a steerable step-level reward that classifies each tool call by cognitive behavior and marginal utility, enabling explicit control over search trajectory breadth, depth, and horizon. These improvements enable reliable extension of tool-calling beyond 20 calls when warranted. The second is Fathom-Synthesizer-4B, trained from Qwen3-4B, which converts multi-turn DeepSearch traces into structured, citation-dense DeepResearch Reports for comprehensive synthesis. Evaluated on DeepSearch benchmarks (SimpleQA, FRAMES, WebWalker, Seal0, MuSiQue) and DeepResearch-Bench, the system achieves state-of-the-art performance in the open-weights category while demonstrating strong generalization to diverse reasoning tasks including HLE, AIME-25, GPQA-Diamond, and MedQA.

View Paper