Characterizing Bias: Benchmarking Large Language Models in Simplified versus Traditional Chinese

Hanjia Lyu, Jiebo Luo, Jian Kang, Allison Koenecke

2025-05-29

Characterizing Bias: Benchmarking Large Language Models in Simplified
versus Traditional Chinese

Summary

This paper talks about how large language models, like the ones used in chatbots, might treat Simplified Chinese and Traditional Chinese differently, especially when it comes to choosing words and names that are common in different regions.

What's the problem?

The problem is that these AI models can be biased depending on the kind of Chinese they were mostly trained on. This means they might not be as accurate or fair when working with one version of Chinese compared to the other, which can affect users who prefer or need a specific form.

What's the solution?

The researchers tested how these models perform on tasks involving both Simplified and Traditional Chinese, looking at how they pick regional terms and names. They found that the differences in performance are mainly caused by the type of training data and the way the language is broken down into tokens for the AI to understand.

Why it matters?

This is important because it shows that language models need to be carefully checked and improved so they can serve speakers of both Simplified and Traditional Chinese equally well. Making AI fairer and more accurate for everyone helps ensure that technology is inclusive and useful for people from different backgrounds.

Abstract

Research examines LLM performance biases between Simplified and Traditional Chinese in regional term and name choice tasks, attributing differences to training data and tokenization.

View Paper