Investigating Hallucination in Conversations for Low Resource Languages
Amit Das, Md. Najib Hasan, Souvika Sarkar, Zheng Zhang, Fatemeh Jamshidi, Tathagata Bhattacharya, Nilanjana Raychawdhury, Dongji Feng, Vinija Jain, Aman Chadha
2025-08-04
Summary
This paper talks about how large language models (LLMs) make fewer mistakes called hallucinations when generating answers in Mandarin compared to languages like Hindi and Farsi across different models.
What's the problem?
The problem is that LLMs sometimes create false or made-up information that sounds believable, a problem called hallucination. This problem is worse in some languages than others, especially those that have less training data available.
What's the solution?
The paper studies and compares how often hallucinations happen in different low-resource languages like Hindi and Farsi versus Mandarin, showing that Mandarin models tend to hallucinate less, which helps understand language differences and model limitations.
Why it matters?
This matters because knowing which languages have more hallucinations helps researchers improve AI models, making them more reliable and useful for more people speaking different languages.
Abstract
LLMs generate fewer hallucinations in Mandarin compared to Hindi and Farsi across multiple models.