RabakBench: Scaling Human Annotations to Construct Localized Multilingual Safety Benchmarks for Low-Resource Languages

Gabriel Chua, Leanne Tan, Ziyu Ge, Roy Ka-Wei Lee

2025-07-10

RabakBench: Scaling Human Annotations to Construct Localized
Multilingual Safety Benchmarks for Low-Resource Languages

Summary

This paper talks about RabakBench, a new safety benchmark designed to test and improve AI models' ability to handle multiple languages spoken in Singapore, especially those with fewer resources available for training and evaluation.

What's the problem?

The problem is that many languages have limited data and tools to test AI models for safety, meaning harmful or inappropriate content might go undetected in these languages, which reduces user protection and trust.

What's the solution?

The researchers created RabakBench by scaling up human annotations to build detailed and localized safety tests in several Singaporean languages. This helps AI models get better at understanding and handling different languages safely, even when there isn't much existing data.

Why it matters?

This matters because it makes AI safer and more inclusive for speakers of less commonly supported languages, helping protect more people from harmful content and improving AI fairness worldwide.

Abstract

RabakBench is a multilingual safety benchmark for Singapore's languages, enabling robust safety evaluation and dataset creation in low-resource environments.

View Paper