GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators
Jiacheng Guo, Ling Yang, Peter Chen, Qixin Xiao, Yinjie Wang, Xinzhe Juan, Jiahao Qiu, Ke Shen, Mengdi Wang
2025-12-23
Summary
This paper introduces a new way to train AI agents, specifically Large Language Models, to be better at interacting with the real world.
What's the problem?
Training these AI agents is really expensive and relies on a fixed set of examples of how to interact with things. Getting enough good examples is hard, and the AI doesn't get to practice on things specifically designed to help it improve at its current skill level. It's like trying to learn to play basketball only by watching videos – you need to actually *play* to get better, and you need practice that matches your current abilities.
What's the solution?
The researchers created a system called GenEnv where the AI agent plays a game against a computer-generated environment. This environment isn't just random; it *learns* what the AI is good at and creates challenges that are just a little bit harder, pushing the AI to improve. Think of it as a personal trainer for AI. They use a special reward system that focuses on giving the AI tasks that are within its reach but still challenging. This system generates new training data as the AI learns, instead of relying on a pre-made dataset.
Why it matters?
This is important because it makes training AI agents much more efficient. They can achieve performance comparable to much larger, more expensive models while using significantly less training data. This means we can build more capable AI agents without needing massive amounts of resources, opening the door to more widespread and accessible AI applications.
Abstract
Training capable Large Language Model (LLM) agents is critically bottlenecked by the high cost and static nature of real-world interaction data. We address this by introducing GenEnv, a framework that establishes a difficulty-aligned co-evolutionary game between an agent and a scalable, generative environment simulator. Unlike traditional methods that evolve models on static datasets, GenEnv instantiates a dataevolving: the simulator acts as a dynamic curriculum policy, continuously generating tasks specifically tailored to the agent's ``zone of proximal development''. This process is guided by a simple but effective α-Curriculum Reward, which aligns task difficulty with the agent's current capabilities. We evaluate GenEnv on five benchmarks, including API-Bank, ALFWorld, BFCL, Bamboogle, and TravelPlanner. Across these tasks, GenEnv improves agent performance by up to +40.3\% over 7B baselines and matches or exceeds the average performance of larger models. Compared to Gemini 2.5 Pro-based offline data augmentation, GenEnv achieves better performance while using 3.3times less data. By shifting from static supervision to adaptive simulation, GenEnv provides a data-efficient pathway for scaling agent capabilities.