SkillFactory: Self-Distillation For Learning Cognitive Behaviors
Zayne Sprague, Jack Lu, Manya Wadhwa, Sedrick Keh, Mengye Ren, Greg Durrett
2025-12-04
Summary
This paper introduces SkillFactory, a new method for teaching AI models complex thinking skills like checking their work and trying different approaches. It focuses on getting models ready to learn these skills *before* using a more complex training process called reinforcement learning.
What's the problem?
AI models are getting better at reasoning, often by thinking through problems step-by-step. However, this usually requires the model to *already* have some basic skills like verifying answers or knowing when to try a different method. It's hard to teach these skills to a model if it doesn't show them naturally to begin with, and previous methods relied on having a really strong 'teacher' model to learn from.
What's the solution?
SkillFactory works by taking the AI model's own attempts at solving problems and rearranging them to *look* like examples of the desired skills. Imagine taking a messy first draft and editing it to highlight the parts where the model checked its work. These examples, though not perfect, help 'prime' the model to learn those skills more effectively when it's later trained with reinforcement learning. It's like giving the model a head start by showing it what good reasoning looks like, even if it didn't produce it perfectly on its own.
Why it matters?
This research is important because it shows we can teach AI models valuable thinking skills even if they don't already possess them. By preparing the model with these 'skill-based' examples before more advanced training, the model becomes better at handling harder problems, more reliably uses those skills, and is less likely to mess up when faced with new, unfamiliar situations. This means more robust and trustworthy AI systems.
Abstract
Reasoning models leveraging long chains of thought employ various cognitive skills, such as verification of their answers, backtracking, retrying by an alternate method, and more. Previous work has shown that when a base language model exhibits these skills, training that model further with reinforcement learning (RL) can learn to leverage them. How can we get models to leverage skills that aren't exhibited by base models? Our work, SkillFactory, is a method for fine-tuning models to roughly learn these skills during a supervised fine-tuning (SFT) stage prior to RL. Our approach does not rely on distillation from a stronger model, but instead uses samples from the model itself, rearranged to provide training data in the format of those skills. These "silver" SFT traces may be imperfect, but are nevertheless effective for priming a model to acquire skills during RL. Our evaluation shows that (1) starting from SkillFactory SFT initialization helps a model to generalize to harder variants of a task post-RL, despite lower performance pre-RL; (2) cognitive skills are indeed used by the model; (3) RLed SkillFactory models are more robust to regression on out-of-domain tasks than RLed base models. Our work suggests that inductive biases learned prior to RL help models learn robust cognitive skill use.