Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs
Kanishk Gandhi, Ayush Chakravarthy, Anikait Singh, Nathan Lile, Noah D. Goodman
2025-03-04
Summary
This paper talks about how to make AI language models better at solving complex problems by teaching them certain thinking habits that smart humans use. The researchers found that some AI models naturally use these habits, while others don't, and this affects how well they can improve themselves.
What's the problem?
Some AI language models get much better at solving problems when given more time to think, while others don't improve much at all. Researchers wanted to understand why this happens and how to help the AI models that aren't improving.
What's the solution?
The researchers identified four key thinking habits that both smart humans and successful AI models use: checking their work, going back to fix mistakes, breaking big problems into smaller parts, and working backwards from the goal. They found that teaching these habits to AI models that didn't naturally use them helped those models improve a lot when solving problems.
Why it matters?
This matters because it helps us understand how to make AI systems smarter and more effective at solving complex problems. By teaching AI these human-like thinking habits, we can create more capable and adaptable AI assistants that can handle a wider range of tasks and challenges. This could lead to significant improvements in fields like scientific research, problem-solving, and decision-making where AI is increasingly being used.
Abstract
Test-time inference has emerged as a powerful paradigm for enabling language models to ``think'' longer and more carefully about complex challenges, much like skilled human experts. While reinforcement learning (RL) can drive self-improvement in language models on verifiable tasks, some models exhibit substantial gains while others quickly plateau. For instance, we find that Qwen-2.5-3B far exceeds Llama-3.2-3B under identical RL training for the game of Countdown. This discrepancy raises a critical question: what intrinsic properties enable effective self-improvement? We introduce a framework to investigate this question by analyzing four key cognitive behaviors -- verification, backtracking, subgoal setting, and backward chaining -- that both expert human problem solvers and successful language models employ. Our study reveals that Qwen naturally exhibits these reasoning behaviors, whereas Llama initially lacks them. In systematic experimentation with controlled behavioral datasets, we find that priming Llama with examples containing these reasoning behaviors enables substantial improvements during RL, matching or exceeding Qwen's performance. Importantly, the presence of reasoning behaviors, rather than correctness of answers, proves to be the critical factor -- models primed with incorrect solutions containing proper reasoning patterns achieve comparable performance to those trained on correct solutions. Finally, leveraging continued pretraining with OpenWebMath data, filtered to amplify reasoning behaviors, enables the Llama model to match Qwen's self-improvement trajectory. Our findings establish a fundamental relationship between initial reasoning behaviors and the capacity for improvement, explaining why some language models effectively utilize additional computation while others plateau.