Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks

Debargha Ganguly, Vikash Singh, Sreehari Sankar, Biyao Zhang, Xuecen Zhang, Srinivasan Iyengar, Xiaotian Han, Amit Sharma, Shivkumar Kalyanaraman, Vipin Chaudhary

2025-06-02

Grammars of Formal Uncertainty: When to Trust LLMs in Automated
Reasoning Tasks

Summary

This paper talks about how to figure out when you can trust large language models to be correct when they are used for tasks that require strict, logical thinking, like writing computer code or formal rules.

What's the problem?

The problem is that even though large language models can generate answers that sound right, they sometimes make mistakes, especially in technical or logical tasks, and it's hard to know when their answers are actually reliable.

What's the solution?

The researchers introduced a new approach using something called a PCFG framework, which helps the model measure how uncertain it is about its answers. This makes it easier to spot possible errors and decide when to double-check the model's work.

Why it matters?

This is important because it helps people and organizations use AI more safely in situations where mistakes can be costly, like in programming, science, or law, by knowing when to trust the AI and when to be extra careful.

Abstract

This research explores uncertainty quantification in large language models for generating formal specifications, introducing a PCFG framework to improve error detection and selective verification.

View Paper