REST: Stress Testing Large Reasoning Models by Asking Multiple Problems at Once
Zhuoshi Pan, Qizhi Pei, Yu Li, Qiyao Sun, Zinan Tang, H. Vicky Zhao, Conghui He, Lijun Wu
2025-07-15
Summary
This paper talks about REST, a new way to test how well large reasoning AI models perform when asked to solve many problems at the same time instead of one by one.
What's the problem?
Most current AI tests only give one problem at a time, which doesn’t show how models handle real-world challenges where multiple questions or tasks happen together, causing models to struggle or make mistakes under stress.
What's the solution?
The researchers created REST to combine multiple questions into one prompt and ask models to answer all of them at once. This stress test reveals how models perform with higher mental load and shows differences between models that seem similar in simpler tests.
Why it matters?
This matters because REST helps discover the real strengths and weaknesses of AI reasoning models, making it possible to improve them for real-life scenarios where problems are more complex and happen simultaneously.
Abstract
REST, a stress-testing framework, evaluates large reasoning models under simultaneous multi-problem conditions, revealing performance differences and insights into model behavior under real-world reasoning demands.