Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts?
Aabid Karim, Abdul Karim, Bhoomika Lohana, Matt Keon, Jaswinder Singh, Abdul Sattar
2025-03-25
Summary
This paper explores whether AI models that are good at math still work well when the math problems are presented with cultural references that the AI might not be familiar with.
What's the problem?
AI models are trained on a lot of data from the internet, but this data might not represent all cultures equally. So, it's unclear if AI can solve math problems that include cultural elements from less represented cultures.
What's the solution?
The researchers created new math problems that were similar to existing ones but changed the names, foods, and places to represent different cultures. They then tested AI models on these new problems to see if their performance changed.
Why it matters?
This work matters because it shows that AI models can be affected by cultural context, even when the underlying math is the same. This means we need to train AI on more diverse data so they can be used fairly and effectively in different parts of the world.
Abstract
Large Language Models (LLMs) have significantly advanced various fields, particularly coding, mathematical reasoning, and logical problem solving. However, a critical question remains: Do these mathematical reasoning abilities persist when LLMs are presented with culturally adapted math problems? Specifically, how do LLMs perform when faced with math problems embedded in cultural contexts that have no significant representation in main stream web-scale AI training data? To explore this, we generated six synthetic cultural datasets from GSM8K, a widely used benchmark for assessing LLMs' mathematical reasoning skills. While preserving the mathematical logic and numerical values of the original GSM8K test set, we modify cultural elements such as personal names, food items, place names, etc. These culturally adapted datasets provide a more reliable framework for evaluating LLMs' mathematical reasoning under shifting cultural contexts. Our findings reveal that LLMs struggle with math problems when cultural references change, even though the underlying mathematical structure remains constant. Smaller models exhibit greater performance drops compared to larger models. Interestingly, our results also suggest that cultural familiarity can enhance mathematical reasoning. Even models with no explicit mathematical training but exposure to relevant cultural contexts sometimes outperform larger, mathematically proficient models on culturally embedded math problems. This study highlights the impact of cultural context on the mathematical reasoning abilities of LLMs, underscoring the need for more diverse and representative training data to improve robustness in real-world applications. The benchmark data sets and script for reproducing the results are available at https://github.com/akarim23131/Lost_in_Cultural_Translation