IberBench: LLM Evaluation on Iberian Languages
José Ángel González, Ian Borrego Obrador, Álvaro Romo Herrero, Areg Mikael Sarvazyan, Mara Chinea-Ríos, Angelo Basile, Marc Franco-Salvador
2025-04-25
Summary
This paper talks about IberBench, a new set of tests created to check how well large language models work with different languages spoken in the Iberian Peninsula and Ibero-America, like Spanish, Portuguese, Catalan, and others.
What's the problem?
The problem is that most language models are mainly tested and trained on English or a few other major languages, so it's hard to know if they perform just as well on less commonly tested languages, which leaves many people underserved.
What's the solution?
The researchers developed IberBench, which includes a wide variety of tasks and questions in several Iberian and Ibero-American languages. By using this benchmark, they can fairly compare how well different AI models handle these languages and spot areas where improvements are needed.
Why it matters?
This matters because it helps make sure that language technology is fair and useful for people who speak all kinds of languages, not just English, supporting better communication, education, and access to information for millions around the world.
Abstract
IberBench is a comprehensive benchmark for evaluating Large Language Models on diverse tasks across languages spoken in the Iberian Peninsula and Ibero-America, addressing limitations of current evaluation practices.