DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data
Venus Team, Sunhao Dai, Yong Deng, Jinzhen Lin, Yusheng Song, Guoqing Wang, Xiaofeng Wu, Yuqi Zhou, Shuo Yang, Zhenzhe Ying, Zhanwei Zhang, Changhua Meng, Weiqiang Wang
2026-04-23
Summary
This paper focuses on creating powerful AI 'agents' that can do complex research tasks, but importantly, these agents are designed to be small enough to run on everyday devices like phones or laptops, rather than needing huge server farms.
What's the problem?
Building these small, capable research agents is difficult because they need a lot of training data to learn effectively. Getting enough high-quality, open-source data for training is a major challenge, and small models can struggle to learn from limited information. Existing small agents aren't very good at tackling long, complicated research projects.
What's the solution?
The researchers developed an agent called DR-Venus, which is relatively small at 4 billion parameters. They trained it in two steps: first, they carefully cleaned and reorganized a dataset of about 10,000 examples to improve its quality, and then they used a special type of learning called reinforcement learning to help the agent become more reliable at completing research tasks. They also designed a reward system that encourages the agent to gather useful information at each step and follow a consistent format.
Why it matters?
This work shows that it's possible to create surprisingly effective AI research agents using relatively small models and limited data. This is important because smaller models are cheaper to run, faster, and more private than larger ones, making them practical for real-world use on a wider range of devices. It also suggests that even small models have a lot of potential, and that clever training techniques can unlock that potential.
Abstract
Edge-scale deep research agents based on small language models are attractive for real-world deployment due to their advantages in cost, latency, and privacy. In this work, we study how to train a strong small deep research agent under limited open-data by improving both data quality and data utilization. We present DR-Venus, a frontier 4B deep research agent for edge-scale deployment, built entirely on open data. Our training recipe consists of two stages. In the first stage, we use agentic supervised fine-tuning (SFT) to establish basic agentic capability, combining strict data cleaning with resampling of long-horizon trajectories to improve data quality and utilization. In the second stage, we apply agentic reinforcement learning (RL) to further improve execution reliability on long-horizon deep research tasks. To make RL effective for small agents in this setting, we build on IGPO and design turn-level rewards based on information gain and format-aware regularization, thereby enhancing supervision density and turn-level credit assignment. Built entirely on roughly 10K open-data, DR-Venus-4B significantly outperforms prior agentic models under 9B parameters on multiple deep research benchmarks, while also narrowing the gap to much larger 30B-class systems. Our further analysis shows that 4B agents already possess surprisingly strong performance potential, highlighting both the deployment promise of small models and the value of test-time scaling in this setting. We release our models, code, and key recipes to support reproducible research on edge-scale deep research agents.