Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-Horizon Tasks

Cheng Yang, Xuemeng Yang, Licheng Wen, Daocheng Fu, Jianbiao Mei, Rong Wu, Pinlong Cai, Yufan Shen, Nianchen Deng, Botian Shi, Yu Qiao, Haifeng Li

2025-10-10

Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-Horizon Tasks

Summary

This paper introduces MUSE, a new way to build AI agents powered by large language models that can actually learn and improve as they work, unlike current agents which are limited by their initial programming.

What's the problem?

Current AI agents using large language models are good at many things, but they struggle with tasks that take a long time to complete because they can't learn from their mistakes or remember what worked well in the past. They're essentially 'stuck' with the knowledge they had when they were first created and can't adapt to new situations or improve their performance over time.

What's the solution?

The researchers created MUSE, which stands for 'Memory-based, Unified, Self-Evolving' agent. It works by giving the agent a special 'memory module' that organizes its experiences. After each step of a task, the agent thinks about what happened, turns that experience into useful information, and stores it in the memory module. This allows the agent to learn from each task and get better at completing future tasks, even ones it hasn't seen before. They tested it with a Gemini-2.5 Flash model and it performed very well.

Why it matters?

This research is important because it represents a big step towards creating AI agents that can truly automate real-world tasks. Instead of needing constant human supervision or reprogramming, these agents can learn and improve on their own, making them much more useful and efficient for things like managing complex projects or assisting with daily work.

Abstract

Large Language Models have demonstrated remarkable capabilities across diverse domains, yet significant challenges persist when deploying them as AI agents for real-world long-horizon tasks. Existing LLM agents suffer from a critical limitation: they are test-time static and cannot learn from experience, lacking the ability to accumulate knowledge and continuously improve on the job. To address this challenge, we propose MUSE, a novel agent framework that introduces an experience-driven, self-evolving system centered around a hierarchical Memory Module. MUSE organizes diverse levels of experience and leverages them to plan and execute long-horizon tasks across multiple applications. After each sub-task execution, the agent autonomously reflects on its trajectory, converting the raw trajectory into structured experience and integrating it back into the Memory Module. This mechanism enables the agent to evolve beyond its static pretrained parameters, fostering continuous learning and self-evolution. We evaluate MUSE on the long-horizon productivity benchmark TAC. It achieves new SOTA performance by a significant margin using only a lightweight Gemini-2.5 Flash model. Sufficient Experiments demonstrate that as the agent autonomously accumulates experience, it exhibits increasingly superior task completion capabilities, as well as robust continuous learning and self-evolution capabilities. Moreover, the accumulated experience from MUSE exhibits strong generalization properties, enabling zero-shot improvement on new tasks. MUSE establishes a new paradigm for AI agents capable of real-world productivity task automation.

View Paper