LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps

Felix Friedrich, Simone Tedeschi, Patrick Schramowski, Manuel Brack, Roberto Navigli, Huu Nguyen, Bo Li, Kristian Kersting

2024-12-23

LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps

Summary

This paper talks about M-ALERT, a new benchmark designed to evaluate the safety of large language models (LLMs) in multiple languages, including English, French, German, Italian, and Spanish. It aims to identify safety gaps that occur when these models are used across different languages.

What's the problem?

Ensuring that LLMs are safe and reliable in various languages is important, but many existing safety evaluations focus only on one language, usually English. This can lead to significant safety issues when the models are used in other languages, as they may produce unsafe or inappropriate responses that vary greatly from one language to another.

What's the solution?

M-ALERT addresses this issue by providing a comprehensive evaluation framework that includes 75,000 prompts across five languages. The authors conducted experiments with ten state-of-the-art LLMs to analyze their safety performance in different languages. They found that models often behave inconsistently in terms of safety across languages and categories, highlighting the need for better multilingual safety practices.

Why it matters?

This research is important because it helps improve the safety and reliability of AI systems used globally. By identifying and addressing safety gaps in LLMs for different languages, M-ALERT ensures that users can interact with these models safely and responsibly, regardless of the language they speak.

Abstract

Building safe Large Language Models (LLMs) across multiple languages is essential in ensuring both safe access and linguistic diversity. To this end, we introduce M-ALERT, a multilingual benchmark that evaluates the safety of LLMs in five languages: English, French, German, Italian, and Spanish. M-ALERT includes 15k high-quality prompts per language, totaling 75k, following the detailed ALERT taxonomy. Our extensive experiments on 10 state-of-the-art LLMs highlight the importance of language-specific safety analysis, revealing that models often exhibit significant inconsistencies in safety across languages and categories. For instance, Llama3.2 shows high unsafety in the category crime_tax for Italian but remains safe in other languages. Similar differences can be observed across all models. In contrast, certain categories, such as substance_cannabis and crime_propaganda, consistently trigger unsafe responses across models and languages. These findings underscore the need for robust multilingual safety practices in LLMs to ensure safe and responsible usage across diverse user communities.

View Paper