Agent Learning via Early Experience

Kai Zhang, Xiangchao Chen, Bo Liu, Tianci Xue, Zeyi Liao, Zhihan Liu, Xiyao Wang, Yuting Ning, Zhaorun Chen, Xiaohan Fu, Jian Xie, Yuxuan Sun, Boyu Gou, Qi Qi, Zihang Meng, Jianwei Yang, Ning Zhang, Xian Li, Ashish Shah, Dat Huynh, Hengduo Li, Zi Yang

2025-10-10

Summary

This paper explores a new way to train AI agents, specifically language agents that can interact with the world, to become better at complex tasks without relying heavily on humans showing them exactly what to do.

What's the problem?

Currently, training these agents is tough. Many real-world situations don't give clear 'rewards' for good behavior, and even when they do, it can take a very long time for the agent to learn through trial and error. Most agents are trained by watching humans perform tasks, but this is hard to scale up and doesn't prepare them for situations they haven't seen before because human demonstrations are limited in the variety of scenarios they cover.

What's the solution?

The researchers propose a method called 'early experience.' Instead of relying on human experts or waiting for rewards, the agent learns from its *own* initial attempts at a task. The results of these attempts – the states the agent ends up in – are used as learning signals. They investigated two main strategies: first, helping the agent understand how the world works based on the states it encounters, and second, allowing the agent to learn from its mistakes and improve its reasoning. They tested this approach in a variety of different environments and with different types of AI models.

Why it matters?

This work is important because it offers a way to build more capable and adaptable AI agents. By learning from their own experiences early on, these agents can become more effective and generalize better to new situations. It also suggests a path towards combining this 'early experience' approach with traditional reinforcement learning, creating a more robust and efficient learning process.

Abstract

A long-term goal of language agents is to learn and improve through their own experience, ultimately outperforming humans in complex, real-world tasks. However, training agents from experience data with reinforcement learning remains difficult in many environments, which either lack verifiable rewards (e.g., websites) or require inefficient long-horizon rollouts (e.g., multi-turn tool use). As a result, most current agents rely on supervised fine-tuning on expert data, which is challenging to scale and generalizes poorly. This limitation stems from the nature of expert demonstrations: they capture only a narrow range of scenarios and expose the agent to limited environment diversity. We address this limitation with a middle-ground paradigm we call early experience: interaction data generated by the agent's own actions, where the resulting future states serve as supervision without reward signals. Within this paradigm we study two strategies of using such data: (1) Implicit world modeling, which uses collected states to ground the policy in environment dynamics; and (2) Self-reflection, where the agent learns from its suboptimal actions to improve reasoning and decision-making. We evaluate across eight diverse environments and multiple model families. Our approaches consistently improve effectiveness and out-of-domain generalization, highlighting the value of early experience. Moreover, in environments with verifiable rewards, our results provide promising signals that early experience offers a strong foundation for subsequent reinforcement learning, positioning it as a practical bridge between imitation learning and fully experience-driven agents.

View Paper