When Do LLMs Admit Their Mistakes? Understanding the Role of Model Belief in Retraction

Yuqing Yang, Robin Jia

2025-05-23

When Do LLMs Admit Their Mistakes? Understanding the Role of Model
Belief in Retraction

Summary

This paper talks about how often large language models, like chatbots, admit when they've made a mistake, and what can help them get better at doing that.

What's the problem?

The problem is that these models usually stick to their answers, even when they're wrong, especially if they 'think' their answer is correct. This makes it hard for people to trust them completely.

What's the solution?

The researchers found that by using supervised fine-tuning, which is a way of training the models with lots of examples of the right behavior, they can help the models get better at recognizing when they've made a mistake and admitting it.

Why it matters?

This matters because it makes AI more honest and trustworthy, which is really important if people are going to rely on these systems for information or advice.

Abstract

LLMs rarely retract incorrect answers they believe to be factually correct, but supervised fine-tuning can improve their retraction performance by refining their internal beliefs.

View Paper