If We May De-Presuppose: Robustly Verifying Claims through Presupposition-Free Question Decomposition
Shubhashis Roy Dipta, Francis Ferraro
2025-08-26
Summary
This paper investigates problems with how well large language models (LLMs) can reliably verify if a statement is true or false, specifically when answering questions generated by the models themselves.
What's the problem?
LLMs sometimes make hidden assumptions when forming questions, which can lead to incorrect conclusions when checking if a claim is valid. Also, these models are very sensitive to *how* you ask a question – even slight changes in wording can cause significantly different results, making their performance inconsistent and unreliable. While things have improved, this sensitivity still exists.
What's the solution?
The researchers developed a new system for verifying claims that breaks down complex questions into simpler, more direct parts, and ensures these simpler questions don't rely on unproven assumptions. This structured approach helps the LLM reason more clearly and consistently.
Why it matters?
This work is important because it addresses a key weakness in LLMs: their tendency to be unreliable and easily swayed by minor changes in input. By making claim verification more robust, we can build more trustworthy AI systems that provide more accurate and consistent information, improving performance by a few percentage points.
Abstract
Prior work has shown that presupposition in generated questions can introduce unverified assumptions, leading to inconsistencies in claim verification. Additionally, prompt sensitivity remains a significant challenge for large language models (LLMs), resulting in performance variance as high as 3-6%. While recent advancements have reduced this gap, our study demonstrates that prompt sensitivity remains a persistent issue. To address this, we propose a structured and robust claim verification framework that reasons through presupposition-free, decomposed questions. Extensive experiments across multiple prompts, datasets, and LLMs reveal that even state-of-the-art models remain susceptible to prompt variance and presupposition. Our method consistently mitigates these issues, achieving up to a 2-5% improvement.