Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models

Tingchen Fu, Jiawei Gu, Yafu Li, Xiaoye Qu, Yu Cheng

2025-05-23

Scaling Reasoning, Losing Control: Evaluating Instruction Following in
Large Reasoning Models

Summary

This paper talks about how making AI models better at complex reasoning can sometimes make them worse at following instructions, especially when solving math problems.

What's the problem?

As language models get smarter and more capable of advanced reasoning, they sometimes stop listening as closely to the instructions they're given, which can lead to answers that are technically smart but not actually what was asked for.

What's the solution?

The researchers studied this issue using a system called MathIF, looking closely at how well large language models balance being good at reasoning with actually doing what they're told, and found that improving reasoning can sometimes hurt instruction following.

Why it matters?

This matters because it shows that just making AI smarter isn't enough—we also need to make sure it listens and follows directions, especially when it's used for important tasks like homework help, tutoring, or problem solving.

Abstract

An empirical analysis of MathIF identifies a tension between enhancing reasoning capacity and maintaining instruction adherence in large language models.

View Paper