MultiHal: Multilingual Dataset for Knowledge-Graph Grounded Evaluation of LLM Hallucinations

Ernests Lavrinovics, Russa Biswas, Katja Hose, Johannes Bjerva

2025-05-22

MultiHal: Multilingual Dataset for Knowledge-Graph Grounded Evaluation
of LLM Hallucinations

Summary

This paper talks about MultiHal, a new dataset that helps researchers test how well large language models can avoid making up facts, or 'hallucinating,' in different languages by checking their answers against real knowledge graphs.

What's the problem?

Large language models sometimes give answers that sound convincing but are actually false, and it's especially hard to catch these mistakes when the models are working in multiple languages or need to connect several pieces of information together.

What's the solution?

The researchers built a special benchmark using knowledge graphs, which are like giant maps of real facts, and designed it to work in many languages and require the model to connect different facts, so they can better measure and reduce hallucinations.

Why it matters?

This matters because it helps make AI more trustworthy and accurate around the world, making sure it gives correct information no matter what language it's working in.

Abstract

A multilingual, multihop benchmark using knowledge graphs for evaluating and mitigating hallucinations in large language models.

View Paper