Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Yue Wang, Qiuzhi Liu, Jiahao Xu, Tian Liang, Xingyu Chen, Zhiwei He, Linfeng Song, Dian Yu, Juntao Li, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu, Haitao Mi, Dong Yu

2025-01-31

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Summary

This paper talks about a problem called 'underthinking' in advanced AI language models, especially when they're trying to solve complex reasoning tasks. The researchers found that these AIs often jump between different thoughts too quickly, which can lead to wrong answers.

What's the problem?

The problem is that AI language models, even really smart ones, sometimes think too shallowly. They switch between different ideas too fast without fully exploring each one. This is like a student who keeps changing their approach to a math problem without finishing any single method. As a result, the AI often gives wrong answers, especially on tough math questions.

What's the solution?

To solve this, the researchers did a few things. First, they studied the problem carefully using challenging tests. Then, they created a new way to measure how much an AI is 'underthinking' by looking at how efficiently it uses words in wrong answers. Finally, they came up with a clever trick called TIP (thought switching penalty) that encourages the AI to stick with one line of thinking longer before jumping to another. This helps the AI explore each idea more thoroughly.

Why it matters?

This matters because it helps make AI smarter and more reliable. By fixing the 'underthinking' problem, AI can get better at solving complex problems without needing to be retrained, which saves time and resources. This could lead to AI that thinks more like humans do when tackling difficult questions, making them more useful for things like advanced math, scientific research, or any task that requires deep, careful thinking. It's a step towards AI that doesn't just know a lot, but can also reason through problems more effectively.

Abstract

Large language models (LLMs) such as OpenAI's o1 have demonstrated remarkable abilities in complex reasoning tasks by scaling test-time compute and exhibiting human-like deep thinking. However, we identify a phenomenon we term underthinking, where o1-like LLMs frequently switch between different reasoning thoughts without sufficiently exploring promising paths to reach a correct solution. This behavior leads to inadequate depth of reasoning and decreased performance, particularly on challenging mathematical problems. To systematically analyze this issue, we conduct experiments on three challenging test sets and two representative open-source o1-like models, revealing that frequent thought switching correlates with incorrect responses. We introduce a novel metric to quantify underthinking by measuring token efficiency in incorrect answers. To address underthinking, we propose a decoding strategy with thought switching penalty TIP that discourages premature transitions between thoughts, encouraging deeper exploration of each reasoning path. Experimental results demonstrate that our approach improves accuracy across challenging datasets without requiring model fine-tuning. Our findings contribute to understanding reasoning inefficiencies in o1-like LLMs and offer a practical solution to enhance their problem-solving capabilities.

View Paper