ClaimIQ at CheckThat! 2025: Comparing Prompted and Fine-Tuned Language Models for Verifying Numerical Claims

Anirban Saha Anik, Md Fahimul Kabir Chowdhury, Andrew Wyckoff, Sagnik Ray Choudhury

2025-09-16

ClaimIQ at CheckThat! 2025: Comparing Prompted and Fine-Tuned Language Models for Verifying Numerical Claims

Summary

This paper details a system built to automatically check if statements about numbers and dates are true, using information found online.

What's the problem?

The core issue is automatically verifying if claims involving numbers or specific times are actually supported by evidence. It's hard for computers to 'understand' if a source confirms or denies a numerical or temporal statement, and current systems struggle to generalize well to new, unseen data.

What's the solution?

The researchers tried two main approaches. First, they used powerful AI language models and gave them instructions to check the claims directly. Second, they took a language model and specifically trained it on this task, but in a way that didn't require changing the entire model – they used a technique called LoRA. They also experimented with different ways to provide the AI with evidence, like giving it the whole document or just the most relevant sentences.

Why it matters?

This work is important because automatically verifying facts is crucial in today's world, especially with the spread of misinformation. Improving how computers handle numerical and temporal claims makes fact-checking more efficient and reliable, and the findings highlight that how you present information to the AI and how well it adapts to new situations are key to success.

Abstract

This paper presents our system for Task 3 of the CLEF 2025 CheckThat! Lab, which focuses on verifying numerical and temporal claims using retrieved evidence. We explore two complementary approaches: zero-shot prompting with instruction-tuned large language models (LLMs) and supervised fine-tuning using parameter-efficient LoRA. To enhance evidence quality, we investigate several selection strategies, including full-document input and top-k sentence filtering using BM25 and MiniLM. Our best-performing model LLaMA fine-tuned with LoRA achieves strong performance on the English validation set. However, a notable drop in the test set highlights a generalization challenge. These findings underscore the importance of evidence granularity and model adaptation for robust numerical fact verification.

View Paper