Reinforcement Learning for Self-Improving Agent with Skill Library
Jiongxiao Wang, Qiaojing Yan, Yawei Wang, Yijun Tian, Soumya Smruti Mishra, Zhichao Xu, Megha Gandhi, Panpan Xu, Lin Lee Cheong
2025-12-24
Summary
This paper explores how to make AI agents, powered by large language models, better at learning and improving over time, especially when they're used in new and different situations.
What's the problem?
AI agents are really good at complex tasks and conversations, but they often struggle to get consistently better as they encounter new environments. One way to help them is to give them a 'skill library' – a collection of things they've learned to do. However, building these skill libraries is hard because current methods rely heavily on giving the AI very specific instructions, which isn't always reliable or efficient.
What's the solution?
The researchers developed a new system called SAGE, which uses a technique called reinforcement learning to help AI agents build and use skill libraries. Imagine the agent practicing a series of similar tasks, one after another. With each task, it learns new skills and adds them to the library. Then, when it faces the next task, it can use those previously learned skills. SAGE also gives the agent extra 'rewards' for using skills effectively, encouraging it to build a useful library. This 'Sequential Rollout' and 'Skill-integrated Reward' are the key parts of their approach.
Why it matters?
This research is important because it makes AI agents more adaptable and efficient. In tests using a simulated app environment, SAGE helped agents complete tasks more accurately, with fewer steps and using less processing power, compared to existing methods. This means AI could become more practical and useful in real-world applications where environments are constantly changing.
Abstract
Large Language Model (LLM)-based agents have demonstrated remarkable capabilities in complex reasoning and multi-turn interactions but struggle to continuously improve and adapt when deployed in new environments. One promising approach is implementing skill libraries that allow agents to learn, validate, and apply new skills. However, current skill library approaches rely primarily on LLM prompting, making consistent skill library implementation challenging. To overcome these challenges, we propose a Reinforcement Learning (RL)-based approach to enhance agents' self-improvement capabilities with a skill library. Specifically, we introduce Skill Augmented GRPO for self-Evolution (SAGE), a novel RL framework that systematically incorporates skills into learning. The framework's key component, Sequential Rollout, iteratively deploys agents across a chain of similar tasks for each rollout. As agents navigate through the task chain, skills generated from previous tasks accumulate in the library and become available for subsequent tasks. Additionally, the framework enhances skill generation and utilization through a Skill-integrated Reward that complements the original outcome-based rewards. Experimental results on AppWorld demonstrate that SAGE, when applied to supervised-finetuned model with expert experience, achieves 8.9% higher Scenario Goal Completion while requiring 26% fewer interaction steps and generating 59% fewer tokens, substantially outperforming existing approaches in both accuracy and efficiency.