S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability of Large Reasoning Models

Wenyuan Zhang, Shuaiyi Nie, Xinghua Zhang, Zefeng Zhang, Tingwen Liu

2025-04-15

S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability
of Large Reasoning Models

Summary

This paper talks about S1-Bench, a new test designed to see how well big AI models can handle easy, everyday tasks that humans usually solve quickly and without much thought, using what scientists call 'System 1' or intuitive thinking.

What's the problem?

The problem is that while large reasoning models are great at solving complex problems, they often struggle with simple tasks that should be fast and automatic. Instead of giving quick answers like people do, these models tend to overthink and take too long, which makes them less efficient at basic tasks.

What's the solution?

The researchers created S1-Bench, which is a set of simple challenges meant to test how efficiently these AI models can use intuitive thinking. By running the models through these tasks, they discovered that the AIs often waste time and resources by being too cautious or deliberate, instead of just going with the obvious answer.

Why it matters?

This work matters because it shows that even the smartest AI models still have a lot to learn when it comes to thinking like humans in everyday situations. By highlighting these weaknesses, S1-Bench can help researchers build models that are not only good at hard problems but also quick and smart with the easy stuff.

Abstract

S1-Bench evaluates the efficiency of Large Reasoning Models in simple tasks requiring intuitive thinking, revealing significant inefficiencies and a tendency for unnecessary deliberation.

View Paper