Skill-Targeted Adaptive Training

Yinghui He, Abhishek Panigrahi, Yong Lin, Sanjeev Arora

2025-10-14

Summary

This paper introduces a new way to improve language models, specifically when they struggle to get better at tasks they've already seen similar examples of, like math problems. It focuses on identifying and addressing specific skill gaps the model has.

What's the problem?

Often, when you try to improve a language model by simply showing it more examples of a task it already knows something about, it stops getting significantly better. This is called 'saturation'. The model seems to hit a limit, even with more data. This happens because the model might be missing specific skills needed to solve the problems, and just seeing more of the same examples doesn't necessarily teach it those missing skills.

What's the solution?

The researchers developed a method called STAT, which uses a more powerful language model as a 'teacher'. The teacher analyzes the task and figures out what skills are needed. Then, it looks at the 'student' model's attempts and identifies which skills the student is consistently messing up – creating a 'Missing-Skill-Profile'. STAT then either gives more weight to existing examples that require those missing skills, or even creates entirely new examples focused on practicing those skills. There are two versions: STAT-Sel reweights existing examples, and STAT-Syn creates new ones.

Why it matters?

This work is important because it shows a way to break through the 'saturation' problem in language models. By specifically targeting skill gaps, the models can continue to improve, even on tasks they've seen before. It also works well *with* other improvement techniques, like reinforcement learning, making the overall training process more effective. This means we can build smarter and more capable AI systems.

Abstract

Language models often show little to no improvement (i.e., "saturation") when trained via vanilla supervised fine-tuning (SFT) on data similar to what they saw in their training set (e.g., MATH). We introduce a new fine-tuning strategy, STAT, to train such a student model by using the metacognition ability of a stronger large language model (LLM) as the teacher. The teacher uses the task dataset to create a list of skills needed for the task, and then labels each data point with its required skills (Didolkar et al., 2024). By monitoring the student's answers, the teacher creates a Missing-Skill-Profile for the student, tracking how often they failed to apply each skill in their responses. We use this idea to build a modified training set in one of two ways. In STAT-Sel, the teacher uses an existing set of training examples but adaptively reweights them according to the Missing-Skill-Profile. In STAT-Syn, the teacher synthesizes additional examples involving missing skills. Across extensive experiments on Llama and Qwen models, our methods yield improvements of up to 7.5% on MATH, whereas SFT provides only limited gains. Furthermore, STAT enhances performance on out-of-distribution benchmarks (e.g., AIME24/25, AMC23, etc.) by an average of 4.6%. Crucially, we find that STAT is complementary to RL via GRPO (Shao et al., 2024): after the model is improved using STAT to address skill gaps, GRPO continues to add further gains. We conclude that skill-targeted adaptive training should broadly improve current training pipelines. Our code is available at: https://github.com/princeton-pli/STAT.

View Paper