Lost in the Mix: Evaluating LLM Understanding of Code-Switched Text

Amr Mohamed, Yang Zhang, Michalis Vazirgiannis, Guokan Shang

2025-06-25

Lost in the Mix: Evaluating LLM Understanding of Code-Switched Text

Summary

This paper talks about how large language models understand and process texts that switch between two or more languages, a practice called code-switching.

What's the problem?

The problem is that large language models have trouble fully understanding or reasoning about texts where different languages are mixed together, even though they do well with texts in just one language. Also, using English embedded in other languages can sometimes help the models perform better.

What's the solution?

The researchers tested how different ways of prompting and fine-tuning the models affect their ability to handle code-switched text. They found that these training methods impact how well the models deal with the mixed-language challenges differently.

Why it matters?

This matters because people who speak multiple languages often switch between them when they talk or write, and improving AI understanding of this helps make better translation tools, chatbots, and language learning apps that work well in real-world multilingual situations.

Abstract

LLMs' comprehension and reasoning skills are evaluated under code-switching conditions, revealing that embedding English into other languages can improve understanding, while prompts and fine-tuning affect degradation mitigation differently.

View Paper