SemViQA: A Semantic Question Answering System for Vietnamese Information Fact-Checking

Nam V. Nguyen, Dien X. Tran, Thanh T. Tran, Anh T. Hoang, Tai V. Duong, Di T. Le, Phuc-Lu Le

2025-03-05

SemViQA: A Semantic Question Answering System for Vietnamese Information
Fact-Checking

Summary

This paper talks about SemViQA, a new AI system designed to fact-check information in Vietnamese, which is more accurate and faster than previous methods.

What's the problem?

With the rise of AI language models, there's more false information spreading online, especially in languages like Vietnamese that don't have many resources for fact-checking. Current methods struggle with understanding the meaning of words in context and complex language structures, often having to choose between being accurate or being fast.

What's the solution?

The researchers created SemViQA, which combines two main parts: one that finds relevant evidence based on meaning (Semantic-based Evidence Retrieval), and another that decides if a statement is true or false in two steps (Two-step Verdict Classification). This system is both accurate and fast, setting new records in Vietnamese fact-checking tests. They also made a faster version that's seven times quicker while still being very accurate.

Why it matters?

This matters because it helps fight the spread of false information in Vietnamese, which is crucial in the age of AI-generated content. By being both accurate and fast, SemViQA can help verify information more efficiently, potentially improving the quality of online information for Vietnamese speakers. It also shows how AI can be used to solve problems in languages that don't have as many technological resources as English or Chinese.

Abstract

The rise of misinformation, exacerbated by Large Language Models (LLMs) like GPT and Gemini, demands robust fact-checking solutions, especially for low-resource languages like Vietnamese. Existing methods struggle with semantic ambiguity, homonyms, and complex linguistic structures, often trading accuracy for efficiency. We introduce SemViQA, a novel Vietnamese fact-checking framework integrating Semantic-based Evidence Retrieval (SER) and Two-step Verdict Classification (TVC). Our approach balances precision and speed, achieving state-of-the-art results with 78.97\% strict accuracy on ISE-DSC01 and 80.82\% on ViWikiFC, securing 1st place in the UIT Data Science Challenge. Additionally, SemViQA Faster improves inference speed 7x while maintaining competitive accuracy. SemViQA sets a new benchmark for Vietnamese fact verification, advancing the fight against misinformation. The source code is available at: https://github.com/DAVID-NGUYEN-S16/SemViQA.

View Paper