Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills

Jingwei Ni, Yihao Liu, Xinpeng Liu, Yutao Sun, Mengyu Zhou, Pengyu Cheng, Dexin Wang, Xiaoxi Jiang, Guanjun Jiang

2026-03-30

Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills

Summary

This paper introduces a new method called Trace2Skill for automatically creating skills for AI agents powered by Large Language Models (LLMs). These skills help the agents perform complex tasks more effectively.

What's the problem?

Currently, giving LLM agents specialized skills is difficult. Writing these skills by hand takes a lot of time and doesn't scale well. Existing automated methods often create skills that are unreliable or only work in very specific situations because they either don't understand the task deeply or they just memorize solutions instead of learning general rules.

What's the solution?

Trace2Skill works by observing how an agent performs a task multiple times. Instead of learning from each attempt one after another, it uses many 'sub-agents' to analyze all the attempts *at the same time*. These sub-agents identify useful lessons from each attempt and then combine them into a single, consistent set of skills using logical reasoning. This process can improve skills written by humans or create entirely new skills from scratch. The skills aren't tied to any specific model's internal settings, meaning they can be easily transferred.

Why it matters?

This research is important because it shows we can automatically create robust and reusable skills for LLM agents without needing to change the underlying AI model itself. These skills can significantly improve performance, even on larger models, and they work well even when the task is slightly different from what the agent was originally trained on. This means we can build more capable and adaptable AI agents more easily.

Abstract

Equipping Large Language Model (LLM) agents with domain-specific skills is critical for tackling complex tasks. Yet, manual authoring creates a severe scalability bottleneck. Conversely, automated skill generation often yields fragile or fragmented results because it either relies on shallow parametric knowledge or sequentially overfits to non-generalizable trajectory-local lessons. To overcome this, we introduce Trace2Skill, a framework that mirrors how human experts author skills: by holistically analyzing broad execution experience before distilling it into a single, comprehensive guide. Instead of reacting sequentially to individual trajectories, Trace2Skill dispatches a parallel fleet of sub-agents to analyze a diverse pool of executions. It extracts trajectory-specific lessons and hierarchically consolidates them into a unified, conflict-free skill directory via inductive reasoning. Trace2Skill supports both deepening existing human-written skills and creating new ones from scratch. Experiments in challenging domains, such as spreadsheet, VisionQA and math reasoning, show that Trace2Skill significantly improves upon strong baselines, including Anthropic's official xlsx skills. Crucially, this trajectory-grounded evolution does not merely memorize task instances or model-specific quirks: evolved skills transfer across LLM scales and generalize to OOD settings. For example, skills evolved by Qwen3.5-35B on its own trajectories improved a Qwen3.5-122B agent by up to 57.65 absolute percentage points on WikiTableQuestions. Ultimately, our results demonstrate that complex agent experience can be packaged into highly transferable, declarative skills -- requiring no parameter updates, no external retrieval modules, and utilizing open-source models as small as 35B parameters.

View Paper