Towards General Agentic Intelligence via Environment Scaling
Runnan Fang, Shihao Cai, Baixuan Li, Jialong Wu, Guangyu Li, Wenbiao Yin, Xinyu Wang, Xiaobin Wang, Liangcai Su, Zhen Zhang, Shibin Wu, Zhengwei Tao, Yong Jiang, Pengjun Xie, Fei Huang, Jingren Zhou
2025-09-17
Summary
This paper focuses on making Large Language Models (LLMs) more useful in the real world by improving their ability to use tools and APIs, essentially making them better 'agents'.
What's the problem?
LLMs need to be able to reliably and accurately call functions (like using an API to book a flight or send an email) to be truly helpful. However, training them to do this well across many different situations is difficult. The main challenges are figuring out how to create enough diverse practice environments for the LLM and how to best train the LLM using experiences from those environments.
What's the solution?
The researchers created a system that automatically builds many different simulated environments where an LLM agent can practice using functions. They then used a two-step training process: first, they gave the LLM basic agent skills, and then they specialized it for specific tasks or areas. The resulting model is called AgentScaler.
Why it matters?
This work is important because it helps LLMs become more capable and practical. By improving their ability to use tools and APIs, we can unlock a wider range of real-world applications for these powerful models, making them more than just text generators.
Abstract
Advanced agentic intelligence is a prerequisite for deploying Large Language Models in practical, real-world applications. Diverse real-world APIs demand precise, robust function-calling intelligence, which needs agents to develop these capabilities through interaction in varied environments. The breadth of function-calling competence is closely tied to the diversity of environments in which agents are trained. In this work, we scale up environments as a step towards advancing general agentic intelligence. This gives rise to two central challenges: (i) how to scale environments in a principled manner, and (ii) how to effectively train agentic capabilities from experiences derived through interactions with these environments. To address these, we design a scalable framework that automatically constructs heterogeneous environments that are fully simulated, systematically broadening the space of function-calling scenarios. We further adapt a two-phase agent fine-tuning strategy: first endowing agents with fundamental agentic capabilities, then specializing them for domain-specific contexts. Extensive experiments on agentic benchmarks, tau-bench, tau2-Bench, and ACEBench, demonstrate that our trained model, AgentScaler, significantly enhances the function-calling capability of models.