Rethinking Reflection in Pre-Training

Essential AI, Darsh J Shah, Peter Rushton, Somanshu Singla, Mohit Parmar, Kurt Smith, Yash Vanjani, Ashish Vaswani, Adarsh Chaluvaraju, Andrew Hojel, Andrew Ma, Anil Thomas, Anthony Polloreno, Ashish Tanwer, Burhan Drak Sibai, Divya S Mansingka, Divya Shivaprasad, Ishaan Shah, Karl Stratos, Khoi Nguyen, Michael Callahan, Michael Pust

2025-04-08

Summary

This paper talks about how AI language models start learning to catch and fix their own mistakes during basic training, like how you might check your math homework for errors.

What's the problem?

People thought AI models only learned to correct mistakes through special advanced training, but this study shows they start developing this skill much earlier during their basic learning phase.

What's the solution?

Researchers tested this by giving AI models prompts with intentional mistakes and watching how well they could spot and fix these errors at different stages of their basic training.

Why it matters?

This helps us understand how AI learns to think more like humans, which could lead to better AI tutors or assistants that can catch their own errors before giving wrong answers.

Abstract

A language model's ability to reflect on its own reasoning provides a key advantage for solving complex problems. While most recent research has focused on how this ability develops during reinforcement learning, we show that it actually begins to emerge much earlier - during the model's pre-training. To study this, we introduce deliberate errors into chains-of-thought and test whether the model can still arrive at the correct answer by recognizing and correcting these mistakes. By tracking performance across different stages of pre-training, we observe that this self-correcting ability appears early and improves steadily over time. For instance, an OLMo2-7B model pre-trained on 4 trillion tokens displays self-correction on our six self-reflection tasks.

View Paper