Mind DeepResearch Technical Report
MindDR Team, Li Auto Inc
2026-04-20
Summary
This paper introduces MindDR, a new system designed to perform complex research tasks using artificial intelligence. It's built around multiple AI 'agents' working together and achieves impressive results even with relatively smaller AI models compared to other systems.
What's the problem?
Performing in-depth research online requires a system that can plan, search effectively, and then summarize the findings in a clear report. Existing AI systems often need to be incredibly large and computationally expensive to do this well, and evaluating their performance is tricky because simple metrics don't capture the full picture of good research.
What's the solution?
The researchers created MindDR, which uses three specialized AI agents: one for planning the research, one for actually searching the internet for information, and one for writing a final report. These agents are trained in stages, starting with basic instruction following and then improving through a process of trial and error with rewards for good performance. They also created a new benchmark called MindDR Bench, using real user queries and a detailed scoring system to better assess research quality.
Why it matters?
MindDR is important because it shows that powerful research AI doesn't *always* require massive AI models. By cleverly designing the system and training process, they achieved results comparable to much larger systems. This makes advanced research AI more accessible and practical, and its deployment in a real-world product at Li Auto demonstrates its usefulness.
Abstract
We present Mind DeepResearch (MindDR), an efficient multi-agent deep research framework that achieves leading performance with only ~30B-parameter models through a meticulously designed data synthesis and multi-stage training pipeline. The core innovation of MindDR lies in a collaborative three-agent architecture (Planning Agent, DeepSearch Agent, and Report Agent) and a four-stage agent-specialized training pipeline comprising SFT cold-start, Search-RL, Report-RL and preference alignment. With this regime, MindDR demonstrates competitive performance even with ~30B-scale models. Specifically, MindDR achieves 45.7% on BrowseComp-ZH, 42.8% on BrowseComp, 46.5% on WideSearch, 75.0% on xbench-DS, and 52.5 on DeepResearch Bench, outperforming comparable-scale open-source agent systems and rivaling larger-scale models. MindDR has been deployed as an online product in Li Auto. Furthermore, we introduce MindDR Bench, a curated benchmark of 500 real-world Chinese queries from our internal product user interactions, evaluated through a comprehensive multi-dimensional rubric system rather than relying on a single RACE metric. On MindDR Bench, MindDR achieves a state-of-the-art score of 51.8.