MedReseacher-R1: Expert-Level Medical Deep Researcher via A Knowledge-Informed Trajectory Synthesis Framework
Ailing Yu, Lan Yao, Jingnan Liu, Zhe Chen, Jiajun Yin, Yuan Wang, Xinhao Liao, Zhiling Ye, Ji Li, Yun Yue, Hansong Xiao, Hualei Zhou, Chunxiao Guo, Peng Wei, Jinjie Gu
2025-09-18
Summary
This paper focuses on building a better AI system, specifically a 'deep research agent', for the medical field. These agents use large language models to answer complex questions by searching for and combining information, but existing systems struggle with the specialized knowledge needed for medicine.
What's the problem?
Current AI agents, even powerful ones, aren't very good at medical reasoning. This is because they don't have enough detailed medical knowledge built-in and they lack the right tools to efficiently find and use medical information. They struggle with questions that require connecting multiple pieces of medical information to reach an answer.
What's the solution?
The researchers created a new medical research agent called MedResearcher-R1-32B. They improved it in two main ways: first, they created a way to automatically generate challenging medical questions and answers using medical knowledge graphs, focusing on rare conditions. Second, they gave the agent access to a special search engine designed specifically for medical information, alongside standard search tools. They then trained the agent using a combination of direct instruction and a reward system to encourage good performance.
Why it matters?
This work shows that you don't necessarily need a huge, expensive AI model to excel in a specific field like medicine. By focusing on building specialized knowledge and tools, and using clever training techniques, they were able to create an open-source model that performs better than much larger, proprietary systems. This is important because it makes advanced medical AI more accessible and potentially more reliable.
Abstract
Recent developments in Large Language Model (LLM)-based agents have shown impressive capabilities spanning multiple domains, exemplified by deep research systems that demonstrate superior performance on complex information-seeking and synthesis tasks. While general-purpose deep research agents have shown impressive capabilities, they struggle significantly with medical domain challenges, as evidenced by leading proprietary systems achieving limited accuracy on complex medical benchmarks. The key limitations are: (1) the model lacks sufficient dense medical knowledge for clinical reasoning, and (2) the framework is constrained by the absence of specialized retrieval tools tailored for medical contexts.We present a medical deep research agent that addresses these challenges through two core innovations. First, we develop a novel data synthesis framework using medical knowledge graphs, extracting the longest chains from subgraphs around rare medical entities to generate complex multi-hop question-answer pairs. Second, we integrate a custom-built private medical retrieval engine alongside general-purpose tools, enabling accurate medical information synthesis. Our approach generates 2100+ diverse trajectories across 12 medical specialties, each averaging 4.2 tool interactions.Through a two-stage training paradigm combining supervised fine-tuning and online reinforcement learning with composite rewards, our MedResearcher-R1-32B model demonstrates exceptional performance, establishing new state-of-the-art results on medical benchmarks while maintaining competitive performance on general deep research tasks. Our work demonstrates that strategic domain-specific innovations in architecture, tool design, and training data construction can enable smaller open-source models to outperform much larger proprietary systems in specialized domains.