AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Huichi Zhou, Yihang Chen, Siyuan Guo, Xue Yan, Kin Hei Lee, Zihan Wang, Ka Yiu Lee, Guchun Zhang, Kun Shao, Linyi Yang, Jun Wang

2025-08-25

AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Summary

This paper introduces a new way to help AI agents, powered by large language models, learn and improve without constantly needing to be retrained with lots of computing power.

What's the problem?

Currently, making AI agents smarter usually involves either carefully designing a fixed set of steps for them to follow, which isn't flexible, or constantly updating the AI model itself, which takes a lot of time and resources. Existing methods struggle to adapt to new situations efficiently and continuously.

What's the solution?

The researchers developed a system called AgentFly that lets the AI agent learn from its past experiences, storing them in a kind of 'memory'. When faced with a new challenge, the agent looks back at similar past experiences to decide what to do. This 'memory' is updated based on how well the agent is doing, allowing it to continually improve without changing the core AI model. They use a special system to efficiently find the most relevant memories to use.

Why it matters?

This research is important because it offers a more practical way to build AI agents that can learn and adapt in real-time, like a person would. It avoids the expensive and time-consuming process of constantly retraining the AI model, opening the door to more advanced AI systems that can handle complex tasks and continuously acquire new skills, especially in areas like research.

Abstract

In this paper, we introduce a novel learning paradigm for adaptive Large Language Model (LLM) agents that eliminates the need for fine-tuning the underlying LLMs. Existing approaches are often either rigid, relying on static, handcrafted reflection workflows, or computationally intensive, requiring gradient updates of LLM model parameters. In contrast, our method enables low-cost continual adaptation via memory-based online reinforcement learning. We formalise this as a Memory-augmented Markov Decision Process (M-MDP), equipped with a neural case-selection policy to guide action decisions. Past experiences are stored in an episodic memory, either differentiable or non-parametric. The policy is continually updated based on environmental feedback through a memory rewriting mechanism, whereas policy improvement is achieved through efficient memory reading (retrieval). We instantiate our agent model in the deep research setting, namely AgentFly, which attains top-1 on GAIA validation (87.88% Pass@3) and 79.40% on the test set. It reaches 66.6% F1 and 80.4% PM on the DeepResearcher dataset, outperforming the state-of-the-art training-based method, while case-based memory adds 4.7% to 9.6% absolute points on out-of-distribution tasks. Our approach offers a scalable and efficient pathway for developing generalist LLM agents capable of continuous, real-time learning without gradient updates, advancing machine learning towards open-ended skill acquisition and deep research scenarios. The code is available at https://github.com/Agent-on-the-Fly/AgentFly.

View Paper