MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild
Peng Xia, Jianwen Chen, Xinyu Yang, Haoqin Tu, Jiaqi Liu, Kaiwen Xiong, Siwei Han, Shi Qiu, Haonian Ji, Yuyin Zhou, Zeyu Zheng, Cihang Xie, Huaxiu Yao
2026-03-19
Summary
This paper introduces MetaClaw, a new system for improving AI agents powered by large language models (LLMs) that are used to help people with various tasks. It focuses on how to keep these agents up-to-date and performing well as the kinds of requests they receive change over time.
What's the problem?
Currently, when LLM agents are deployed to help users, they often stay the same even as user needs evolve. Existing solutions aren't ideal: simply saving all past interactions takes up a lot of space and doesn't teach the agent anything new, maintaining a fixed set of skills limits the agent's flexibility, and retraining the entire agent requires taking it offline, which disrupts service. This is especially challenging for platforms handling many different types of tasks.
What's the solution?
MetaClaw tackles this by continuously learning in two main ways. First, it analyzes when the agent fails at a task and automatically creates new 'skills' to handle similar situations in the future, doing this without interrupting service. Second, when the system isn't busy, it subtly improves the core LLM using a technique called 'opportunistic policy optimization,' learning from past successes. These two parts work together – better skills lead to better learning, and a better core LLM helps create even better skills. A versioning system ensures that the agent learns from the right data and doesn't get confused by old information. Importantly, it's designed to work with very large LLMs even without powerful computers on site.
Why it matters?
This research is important because it provides a way to keep AI agents consistently improving and adapting to new situations without causing disruptions. The experiments show significant improvements in accuracy and reliability, demonstrating that MetaClaw can make these agents much more useful and robust in real-world applications, specifically improving the performance of the Kimi-K2.5 model.
Abstract
Large language model (LLM) agents are increasingly used for complex tasks, yet deployed agents often remain static, failing to adapt as user needs evolve. This creates a tension between the need for continuous service and the necessity of updating capabilities to match shifting task distributions. On platforms like OpenClaw, which handle diverse workloads across 20+ channels, existing methods either store raw trajectories without distilling knowledge, maintain static skill libraries, or require disruptive downtime for retraining. We present MetaClaw, a continual meta-learning framework that jointly evolves a base LLM policy and a library of reusable behavioral skills. MetaClaw employs two complementary mechanisms. Skill-driven fast adaptation analyzes failure trajectories via an LLM evolver to synthesize new skills, enabling immediate improvement with zero downtime. Opportunistic policy optimization performs gradient-based updates via cloud LoRA fine-tuning and Reinforcement Learning with a Process Reward Model (RL-PRM). This is triggered during user-inactive windows by the Opportunistic Meta-Learning Scheduler (OMLS), which monitors system inactivity and calendar data. These mechanisms are mutually reinforcing: a refined policy generates better trajectories for skill synthesis, while richer skills provide higher-quality data for policy optimization. To prevent data contamination, a versioning mechanism separates support and query data. Built on a proxy-based architecture, MetaClaw scales to production-size LLMs without local GPUs. Experiments on MetaClaw-Bench and AutoResearchClaw show that skill-driven adaptation improves accuracy by up to 32% relative. The full pipeline advances Kimi-K2.5 accuracy from 21.4% to 40.6% and increases composite robustness by 18.3%. Code is available at https://github.com/aiming-lab/MetaClaw.