Starting from a strong base model, DR Tulu undergoes multiple stages of training, which include prompt curation, supervised fine-tuning with teacher-generated trajectories to establish foundational research skills, and reinforcement learning with evolving reward frameworks that focus on improving tool usage, synthesis quality, and citation behavior. The model is designed to integrate with a flexible agent stack that enables it to dynamically choose among various search and browsing tools, enhancing its ability to gather and synthesize information from diverse sources efficiently.
One of DR Tulu's standout features is its modularity and extensibility; it includes an agent library called dr-agent-lib that provides a multi-tool, asynchronous calling framework to manage concurrency and caching effectively. This empowers users to deploy the agent with their own custom tool stacks, achieve reproducibility through accessible training recipes and checkpoints, and extend the model's capabilities by plugging in domain-specific retrieval systems without the need for retraining. DR Tulu’s best-performing 8-billion-parameter model demonstrates notable improvements over larger proprietary systems in rigorous benchmarks, all while maintaining cost-effectiveness and deployment flexibility.

