Test-Time Scaling with Reflective Generative Model
Zixiao Wang, Yuxin Wang, Xiaorui Wang, Mengting Xing, Jie Gao, Jianjun Xu, Guangcan Liu, Chenhui Jin, Zhuo Wang, Shengzhuo Zhang, Hongtao Xie
2025-07-14
Summary
This paper talks about MetaStone-S1, a new type of AI called a reflective generative model that can think carefully by generating ideas and judging them within the same system, improving reasoning without much extra computational cost.
What's the problem?
Many AI models either need separate systems to evaluate their reasoning or require lots of human-labeled data to know if their steps are correct, which makes them slow, expensive, and less efficient.
What's the solution?
MetaStone-S1 combines the part that generates reasoning steps and the part that judges these steps into one shared network, using a self-supervised process that learns from the final correct answer instead of needing detailed human feedback. It can also flexibly adjust how deeply it thinks during testing, making it both fast and accurate.
Why it matters?
This matters because it creates a smarter, more efficient AI that can solve complex problems in math, coding, and language even better while using fewer resources, which helps advance AI reasoning capabilities and makes powerful AI more accessible.
Abstract
MetaStone-S1, a reflective generative model using a self-supervised process reward model, achieves high performance with reduced parameters and supports test time scaling.