VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use

Dongfu Jiang, Yi Lu, Zhuofeng Li, Zhiheng Lyu, Ping Nie, Haozhe Wang, Alex Su, Hui Chen, Kai Zou, Chao Du, Tianyu Pang, Wenhu Chen

2025-09-03

VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use

Summary

This paper introduces VerlTool, a new framework designed to improve how large language models (LLMs) learn to use tools to solve complex problems over multiple steps.

What's the problem?

Current methods for teaching LLMs to use tools, called Agentic Reinforcement Learning with Tool use (ARLT), are often messy and inefficient. Each task usually requires completely new code, making it hard to share improvements and slowing down research. These systems also tend to be slow because they do things one step at a time, waiting for each step to finish before starting the next, and they aren't easily adaptable to new types of tools or problems.

What's the solution?

VerlTool solves these problems by creating a standardized, modular system. It builds on existing work in Reinforcement Learning with Verifiable Rewards (RLVR) and provides a common way to manage different tools like code interpreters, search engines, and databases. Importantly, it allows multiple steps to happen at the same time, speeding things up significantly. The framework is designed so that adding new tools is relatively easy, requiring only simple Python code.

Why it matters?

VerlTool is important because it makes it easier for researchers to develop and test new ways to improve LLMs' ability to use tools. By providing a unified and efficient platform, it encourages collaboration and faster progress in the field of tool-augmented AI, potentially leading to more powerful and versatile AI systems that can tackle a wider range of real-world tasks.

Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) has demonstrated success in enhancing LLM reasoning capabilities, but remains limited to single-turn interactions without tool integration. While recent Agentic Reinforcement Learning with Tool use (ARLT) approaches have emerged to address multi-turn tool interactions, existing works develop task-specific codebases that suffer from fragmentation, synchronous execution bottlenecks, and limited extensibility across domains. These inefficiencies hinder broader community adoption and algorithmic innovation. We introduce VerlTool, a unified and modular framework that addresses these limitations through systematic design principles. VerlTool provides four key contributions: (1) upstream alignment with VeRL ensuring compatibility and simplified maintenance, (2) unified tool management via standardized APIs supporting diverse modalities including code execution, search, SQL databases, and vision processing, (3) asynchronous rollout execution achieving near 2times speedup by eliminating synchronization bottlenecks, and (4) comprehensive evaluation demonstrating competitive performance across 6 ARLT domains. Our framework formalizes ARLT as multi-turn trajectories with multi-modal observation tokens (text/image/video), extending beyond single-turn RLVR paradigms. We train and evaluate models on mathematical reasoning, knowledge QA, SQL generation, visual reasoning, web search, and software engineering tasks, achieving results comparable to specialized systems while providing unified training infrastructure. The modular plugin architecture enables rapid tool integration requiring only lightweight Python definitions, significantly reducing development overhead and providing a scalable foundation for tool-augmented RL research. Our code is open-sourced at https://github.com/TIGER-AI-Lab/verl-tool.

View Paper