Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments

Hongjin Su, Ruoxi Sun, Jinsung Yoon, Pengcheng Yin, Tao Yu, Sercan Ö. Arık

2025-01-22

Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments

Summary

This paper talks about a new way to make AI assistants smarter and more helpful with everyday digital tasks. The researchers created a system called Learn-by-interact that helps AI learn from its own experiences, kind of like how humans learn by doing.

What's the problem?

Current AI assistants, which are based on large language models (LLMs), are pretty smart but they're not great at doing real-world tasks on computers or phones. This is because they don't have enough good examples of how to interact with different apps and websites. It's hard for humans to provide all these examples because it would take too much time and effort.

What's the solution?

The researchers came up with Learn-by-interact, which is like a training program for AI. Instead of humans teaching the AI, Learn-by-interact lets the AI figure things out on its own. It does this by having the AI try out different tasks, like sending emails or analyzing data, in a simulated environment. The AI then looks back at what it did, figures out what worked well, and creates its own instructions for how to do similar tasks in the future. This process is called 'backward construction.' The researchers tested their system on different types of tasks, including coding, web browsing, and using desktop applications.

Why it matters?

This matters because it could make AI assistants much more helpful in our daily lives. If AI can learn how to use different apps and websites on its own, it could become really good at helping us with all sorts of digital tasks. The researchers found that their method made the AI significantly better at these tasks - improving performance by up to 19.5% in some cases. This could lead to AI assistants that can actually understand what we're trying to do on our devices and help us do it more efficiently. It's a big step towards having AI that can adapt to new situations and be truly useful in the real world.

Abstract

Autonomous agents powered by large language models (LLMs) have the potential to enhance human capabilities, assisting with digital tasks from sending emails to performing data analysis. The abilities of existing LLMs at such tasks are often hindered by the lack of high-quality agent data from the corresponding environments they interact with. We propose Learn-by-interact, a data-centric framework to adapt LLM agents to any given environments without human annotations. Learn-by-interact synthesizes trajectories of agent-environment interactions based on documentations, and constructs instructions by summarizing or abstracting the interaction histories, a process called backward construction. We assess the quality of our synthetic data by using them in both training-based scenarios and training-free in-context learning (ICL), where we craft innovative retrieval approaches optimized for agents. Extensive experiments on SWE-bench, WebArena, OSWorld and Spider2-V spanning across realistic coding, web, and desktop environments show the effectiveness of Learn-by-interact in various downstream agentic tasks -- baseline results are improved by up to 12.2\% for ICL with Claude-3.5 and 19.5\% for training with Codestral-22B. We further demonstrate the critical role of backward construction, which provides up to 14.0\% improvement for training. Our ablation studies demonstrate the efficiency provided by our synthesized data in ICL and the superiority of our retrieval pipeline over alternative approaches like conventional retrieval-augmented generation (RAG). We expect that Learn-by-interact will serve as a foundation for agent data synthesis as LLMs are increasingly deployed at real-world environments.

View Paper