Don't Just Fine-tune the Agent, Tune the Environment

Siyuan Lu, Zechuan Wang, Hongxuan Zhang, Qintong Wu, Leilei Gan, Chenyi Zhuang, Jinjie Gu, Tao Lin

2025-10-14

Don't Just Fine-tune the Agent, Tune the Environment

Summary

This paper introduces a new way to train AI agents that can use different tools to solve complex problems, focusing on making the training process more efficient and reliable.

What's the problem?

Training these AI agents is hard because it requires a lot of example data, which is expensive and time-consuming to create. Simply showing the AI examples of how to solve problems (supervised learning) leads to it memorizing those examples instead of truly learning. Traditional reinforcement learning, where the AI learns by trial and error, struggles to get started and can be unstable during training.

What's the solution?

The researchers developed a method called 'Environment Tuning'. Instead of relying on pre-made examples, this method lets the AI learn directly from the problems themselves. It does this by gradually increasing the difficulty of the problems, giving the AI helpful feedback when it makes mistakes, and rewarding it for making progress. They only used a small set of problems to train their agent, but it performed very well, even on problems it hadn't seen before.

Why it matters?

This research is important because it offers a new approach to training AI agents that requires less data and results in agents that are more adaptable and robust. It moves away from simply showing the AI what to do and towards letting it learn through interaction and exploration, which could lead to more powerful and reliable AI systems in the future.

Abstract

Large Language Model (LLM) agents show great promise for complex, multi-turn tool-use tasks, but their development is often hampered by the extreme scarcity of high-quality training data. Supervised fine-tuning (SFT) on synthetic data leads to overfitting, whereas standard reinforcement learning (RL) struggles with a critical cold-start problem and training instability. To address these challenges, we introduce Environment Tuning, a novel training paradigm that enables agents to learn complex behaviors directly from problem instances without relying on pre-collected expert trajectories. Environment Tuning orchestrates this learning process through a structured curriculum, actionable environment augmentation that provides corrective feedback, and fine-grained progress rewards to ensure stable and efficient exploration. Using only 400 problem instances from Berkeley Function-Calling Leaderboard (BFCL) benchmark, our method not only achieves competitive in-distribution performance against strong baselines but also demonstrates superior out-of-distribution generalization, overcoming the performance collapse common to SFT-based approaches. Our work presents a paradigm shift from supervised fine-tuning on static trajectories to dynamic, environment-based exploration, paving the way for training more robust and data-efficient agents.

View Paper