REFLEX: Self-Refining Explainable Fact-Checking via Disentangling Truth into Style and Substance

Chuyi Kong, Gao Wei, Jing Ma, Hongzhan Lin, Yaxin Fan

2025-12-05

REFLEX: Self-Refining Explainable Fact-Checking via Disentangling Truth into Style and Substance

Summary

This paper introduces a new method, called REFLEX, for automatically checking the accuracy of information on social media using large language models. It focuses on making these systems more reliable, faster, and better at explaining *why* something is true or false.

What's the problem?

Currently, fact-checking systems powered by large language models often struggle because they rely too much on looking up information from outside sources. This makes them slow, and they can sometimes make up information (hallucinations). Plus, it's hard to understand *how* they arrived at a conclusion, which is important for trusting the result, especially when quick responses are needed to counter misinformation spreading online.

What's the solution?

REFLEX tackles this by teaching the language model to fact-check through a kind of internal reasoning process, like having a debate with itself. It doesn't need to constantly search for external information. Instead, it learns to identify the core 'truth' within its existing knowledge and then explain that truth in a clear way. It does this by comparing how the model responds to a claim versus how it responds after being 'tuned' for fact-checking, and uses those differences to guide its reasoning and create better explanations. It only needs a small amount of training data to work really well.

Why it matters?

This research is important because it offers a way to build faster, more trustworthy fact-checking systems. By relying less on external sources and focusing on internal reasoning, REFLEX can help combat the spread of misinformation more effectively, and the ability to explain *why* a claim is true or false builds confidence in the system's results. The fact that it works well with limited data is also a big advantage, making it more practical to implement.

Abstract

The prevalence of misinformation on social media threatens public trust, demanding automated fact-checking systems that provide accurate verdicts with interpretable explanations. However, existing large language model-based (LLM-based) approaches often rely heavily on external knowledge sources, introducing substantial latency and even hallucinations that undermine reliability, interpretability, and responsiveness, which is crucial for real-time use. To address these challenges, we propose REason-guided Fact-checking with Latent EXplanations REFLEX paradigm, a plug-and-play, self-refining paradigm that leverages the internal knowledge in backbone model to improve both verdict accuracy and explanation quality. REFLEX reformulates fact-checking as a role-play dialogue and jointly trains verdict prediction and explanation generation. It adaptively extracts contrastive activation pairs between the backbone model and its fine-tuned variant to construct steering vectors that disentangle truth into style and substance naturally. These activation-level signals guide inference and suppress noisy explanations, enabling more faithful and efficient reasoning. Experiments on real-world datasets show that REFLEX outperforms previous methods that steer toward a single truth direction and underscores the challenge traditional approaches face when handling the subtle, human-unknown truth in fact-checking tasks. Remarkably, with only 465 self-refined training samples, RELFEX achieves state-of-the-art performance. Furthermore, models trained with explanatory objectives can effectively guide those without them, yielding up to a 7.57% improvement, highlighting that internal explanation signals play a dual role in both interpreting and enhancing factual reasoning.

View Paper