I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token

Roi Cohen, Konstantin Dobler, Eden Biran, Gerard de Melo

2024-12-12

I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token

Summary

This paper discusses a new method to improve large language models (LLMs) by introducing an [IDK] token, which allows these models to express uncertainty instead of generating incorrect information.

What's the problem?

Large language models often make mistakes and produce false information, a problem known as 'hallucination.' This happens when the models generate text that sounds plausible but is actually incorrect. These errors can mislead users and reduce trust in AI systems.

What's the solution?

The authors propose adding a special token, [IDK] (which stands for 'I don't know'), to the model's vocabulary. This token helps the model indicate when it is unsure about an answer instead of trying to guess and potentially getting it wrong. They also developed a method that adjusts the model's predictions to favor the [IDK] token when it is likely to make a mistake. This way, the model can better handle uncertainty without losing much of its existing knowledge.

Why it matters?

This approach is important because it enhances the reliability of language models. By allowing them to express uncertainty, users can better understand when they can trust the information provided. This could lead to safer and more effective use of AI in various applications, such as chatbots and automated writing tools.

Abstract

Large Language Models are known to capture real-world knowledge, allowing them to excel in many downstream tasks. Despite recent advances, these models are still prone to what are commonly known as hallucinations, causing them to emit unwanted and factually incorrect text. In this work, we propose a novel calibration method that can be used to combat hallucinations. We add a special [IDK] ("I don't know") token to the model's vocabulary and introduce an objective function that shifts probability mass to the [IDK] token for incorrect predictions. This approach allows the model to express uncertainty in its output explicitly. We evaluate our proposed method across multiple model architectures and factual downstream tasks. We find that models trained with our method are able to express uncertainty in places where they would previously make mistakes while suffering only a small loss of encoded knowledge. We further perform extensive ablation studies of multiple variations of our approach and provide a detailed analysis of the precision-recall tradeoff of our method.

View Paper