AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning

Zhiheng Xi, Jixuan Huang, Chenyang Liao, Baodai Huang, Honglin Guo, Jiaqi Liu, Rui Zheng, Junjie Ye, Jiazheng Zhang, Wenxiang Chen, Wei He, Yiwen Ding, Guanyu Li, Zehui Chen, Zhengyin Du, Xuesong Yao, Yufei Xu, Jiecao Chen, Tao Gui, Zuxuan Wu, Qi Zhang, Xuanjing Huang

2025-09-11

AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning

Summary

This paper introduces a new system called AgentGym-RL designed to train artificial intelligence agents, powered by large language models, to solve complex tasks through trial and error, much like how humans learn.

What's the problem?

Currently, it's difficult to build AI agents that can learn to solve complicated, real-world problems on their own, without a lot of pre-programmed guidance. Existing methods often require a lot of specific examples to learn from, and struggle to adapt to new situations or maintain stable learning over many steps. There wasn't a good, all-in-one system for training these agents from scratch using reinforcement learning across different environments.

What's the solution?

The researchers created AgentGym-RL, a flexible framework for training these agents using reinforcement learning. It's designed to be easily adapted to various tasks and uses common reinforcement learning techniques. They also developed a new training method called ScalingInter-RL, which starts by focusing on making good decisions quickly and then gradually encourages the agent to explore different strategies to avoid getting stuck in a rut and to handle longer, more complex tasks effectively.

Why it matters?

This work is important because it provides a tool and a method for building more capable and adaptable AI agents. By allowing agents to learn through interaction and exploration, it moves us closer to creating AI that can tackle real-world problems without needing constant human supervision. The open-source release of AgentGym-RL will allow other researchers to build upon this work and accelerate progress in the field of intelligent agents.

Abstract

Developing autonomous LLM agents capable of making a series of intelligent decisions to solve complex, real-world tasks is a fast-evolving frontier. Like human cognitive development, agents are expected to acquire knowledge and skills through exploration and interaction with the environment. Despite advances, the community still lacks a unified, interactive reinforcement learning (RL) framework that can effectively train such agents from scratch -- without relying on supervised fine-tuning (SFT) -- across diverse and realistic environments. To bridge this gap, we introduce AgentGym-RL, a new framework to train LLM agents for multi-turn interactive decision-making through RL. The framework features a modular and decoupled architecture, ensuring high flexibility and extensibility. It encompasses a wide variety of real-world scenarios, and supports mainstream RL algorithms. Furthermore, we propose ScalingInter-RL, a training approach designed for exploration-exploitation balance and stable RL optimization. In early stages, it emphasizes exploitation by restricting the number of interactions, and gradually shifts towards exploration with larger horizons to encourage diverse problem-solving strategies. In this way, the agent develops more diverse behaviors and is less prone to collapse under long horizons. We perform extensive experiments to validate the stability and effectiveness of both the AgentGym-RL framework and the ScalingInter-RL approach. Our agents match or surpass commercial models on 27 tasks across diverse environments. We offer key insights and will open-source the complete AgentGym-RL framework -- including code and datasets -- to empower the research community in developing the next generation of intelligent agents.

View Paper