Does More Inference-Time Compute Really Help Robustness?

Tong Wu, Chong Xiang, Jiachen T. Wang, Weichen Yu, Chawin Sitawarin, Vikash Sehwag, Prateek Mittal

2025-07-23

Does More Inference-Time Compute Really Help Robustness?

Summary

This paper talks about how using more computing power during the time an AI model is making its predictions, called inference-time scaling, can help large models become more accurate and robust but might actually hurt smaller models if they expose some reasoning steps during inference.

What's the problem?

The problem is that while larger AI models tend to get better with extra computation when they answer questions, smaller models might get confused and less reliable if the extra steps in reasoning are shown to them, especially in situations where safety and security are important.

What's the solution?

The researchers studied how this scaling affects different sized models and found that it's not always better to just use more compute time for all models. They showed that carefully choosing when and how to apply inference-time scaling is important, especially for security-sensitive tasks.

Why it matters?

This matters because understanding when inference-time scaling helps or harms AI models ensures that we use the right approach for different AI systems, making them safer and more dependable in real-world applications.

Abstract

Inference-time scaling improves robustness in large models but can reduce robustness in smaller models if intermediate reasoning steps are exposed, highlighting the need for careful consideration in security-sensitive applications.

View Paper