Test-Time Scaling with Reflective Generative Model

Zixiao Wang, Yuxin Wang, Xiaorui Wang, Mengting Xing, Jie Gao, Jianjun Xu, Guangcan Liu, Chenhui Jin, Zhuo Wang, Shengzhuo Zhang, Hongtao Xie

2025-07-14

Test-Time Scaling with Reflective Generative Model

Summary

This paper talks about MetaStone-S1, a new type of AI called a reflective generative model that can think carefully by generating ideas and judging them within the same system, improving reasoning without much extra computational cost.

What's the problem?

Many AI models either need separate systems to evaluate their reasoning or require lots of human-labeled data to know if their steps are correct, which makes them slow, expensive, and less efficient.

What's the solution?

MetaStone-S1 combines the part that generates reasoning steps and the part that judges these steps into one shared network, using a self-supervised process that learns from the final correct answer instead of needing detailed human feedback. It can also flexibly adjust how deeply it thinks during testing, making it both fast and accurate.

Why it matters?

This matters because it creates a smarter, more efficient AI that can solve complex problems in math, coding, and language even better while using fewer resources, which helps advance AI reasoning capabilities and makes powerful AI more accessible.

Abstract

MetaStone-S1, a reflective generative model using a self-supervised process reward model, achieves high performance with reduced parameters and supports test time scaling.

View Paper