SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization

Zhengxi Lu, Zhiyuan Yao, Jinyang Wu, Chengcheng Han, Qi Gu, Xunliang Cai, Weiming Lu, Jun Xiao, Yueting Zhuang, Yongliang Shen

2026-04-03

SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization

Summary

This paper explores a new way to give large language models (LLMs) the ability to use tools and complete tasks without needing to constantly look up instructions during use. It's about teaching the model to 'internalize' skills instead of just retrieving them when needed.

What's the problem?

Currently, LLMs often use 'skills' – essentially sets of instructions for using tools – that are found and loaded while the model is working. This has a few downsides: sometimes the wrong skills are found, adding extra information that isn't helpful, and it takes up valuable processing space because the model has to read these instructions every time. The model doesn't actually *learn* the skills, it just follows them.

What's the solution?

The researchers developed a system called SKILL0 that trains the model to learn skills directly during the training process. They start by showing the model the skills, then gradually remove the explicit instructions, forcing the model to figure things out on its own. Skills are presented visually, showing how to use tools and complete tasks over multiple steps. The system also intelligently decides which skills are still helpful to the model as it learns, discarding those that aren't needed anymore, ultimately leading to a model that can perform tasks 'zero-shot' – meaning without any external skill retrieval.

Why it matters?

This work is important because it makes LLM agents more efficient and capable. By internalizing skills, the model doesn't waste time and resources on retrieving information during use, and it can perform tasks more reliably. The improvements shown in tasks like controlling virtual environments and answering questions demonstrate that this approach can significantly enhance the performance of LLM-powered agents.

Abstract

Agent skills, structured packages of procedural knowledge and executable resources that agents dynamically load at inference time, have become a reliable mechanism for augmenting LLM agents. Yet inference-time skill augmentation is fundamentally limited: retrieval noise introduces irrelevant guidance, injected skill content imposes substantial token overhead, and the model never truly acquires the knowledge it merely follows. We ask whether skills can instead be internalized into model parameters, enabling zero-shot autonomous behavior without any runtime skill retrieval. We introduce SKILL0, an in-context reinforcement learning framework designed for skill internalization. SKILL0 introduces a training-time curriculum that begins with full skill context and progressively withdraws it. Skills are grouped offline by category and rendered with interaction history into a compact visual context, teaching he model tool invocation and multi-turn task completion. A Dynamic Curriculum then evaluates each skill file's on-policy helpfulness, retaining only those from which the current policy still benefits within a linearly decaying budget, until the agent operates in a fully zero-shot setting. Extensive agentic experiments demonstrate that SKILL0 achieves substantial improvements over the standard RL baseline (+9.7\% for ALFWorld and +6.6\% for Search-QA), while maintaining a highly efficient context of fewer than 0.5k tokens per step. Our code is available at https://github.com/ZJU-REAL/SkillZero.

View Paper