Investigating Human-Aligned Large Language Model Uncertainty

Kyle Moore, Jesse Roberts, Daryl Watson, Pamela Wisniewski

2025-03-18

Investigating Human-Aligned Large Language Model Uncertainty

Summary

This paper explores different ways to measure uncertainty in large language models (LLMs) and how well these measures align with human uncertainty.

What's the problem?

It's important to understand how confident LLMs are in their answers to better control them and build trust with users. Existing measures of uncertainty may not accurately reflect human-like uncertainty.

What's the solution?

The researchers investigated various uncertainty measures to see which ones best correlate with human uncertainty. They found that Bayesian measures and a variation of entropy measures (top-k entropy) tend to agree with human behavior as the model size increases. They also found that combining multiple uncertainty measures can improve human alignment without relying on large model sizes.

Why it matters?

This work is important because it helps identify better ways to measure uncertainty in LLMs, leading to more reliable and trustworthy AI systems.

Abstract

Recent work has sought to quantify large language model uncertainty to facilitate model control and modulate user trust. Previous works focus on measures of uncertainty that are theoretically grounded or reflect the average overt behavior of the model. In this work, we investigate a variety of uncertainty measures, in order to identify measures that correlate with human group-level uncertainty. We find that Bayesian measures and a variation on entropy measures, top-k entropy, tend to agree with human behavior as a function of model size. We find that some strong measures decrease in human-similarity with model size, but, by multiple linear regression, we find that combining multiple uncertainty measures provide comparable human-alignment with reduced size-dependency.

View Paper