Lost in Literalism: How Supervised Training Shapes Translationese in LLMs
Yafu Li, Ronghao Zhang, Zhilin Wang, Huajian Zhang, Leyang Cui, Yongjing Yin, Tong Xiao, Yue Zhang
2025-03-07
Summary
This paper talks about how large language models (LLMs) sometimes produce unnatural translations, a problem called translationese, and explores ways to make these translations sound more natural
What's the problem?
Even though LLMs are trained on huge amounts of natural language data, they still sometimes produce translations that are too literal or sound unnatural. This happens because of biases introduced during the supervised fine-tuning process, where the models learn to translate but might focus too much on word-for-word accuracy rather than natural-sounding language
What's the solution?
The researchers did a thorough study of how often translationese occurs in LLM translations and looked into why it happens during training. They then came up with ways to reduce these biases, such as improving the quality of the example translations used for training and removing unnatural examples from the training data. They tested these methods and found that they significantly reduced translationese and made translations sound more natural, which was confirmed by both human evaluators and automatic testing methods
Why it matters?
This research matters because it helps make AI translations sound more natural and fluent, which is important for clear communication across languages. By improving how LLMs are trained for translation, we can get AI systems that produce translations that sound more like they were written by native speakers. This could make AI translation tools more useful and reliable for things like international business, global communication, and accessing information in different languages
Abstract
Large language models (LLMs) have achieved remarkable success in machine translation, demonstrating impressive performance across diverse languages. However, translationese, characterized by overly literal and unnatural translations, remains a persistent challenge in LLM-based translation systems. Despite their pre-training on vast corpora of natural utterances, LLMs exhibit translationese errors and generate unexpected unnatural translations, stemming from biases introduced during supervised fine-tuning (SFT). In this work, we systematically evaluate the prevalence of translationese in LLM-generated translations and investigate its roots during supervised training. We introduce methods to mitigate these biases, including polishing golden references and filtering unnatural training instances. Empirical evaluations demonstrate that these approaches significantly reduce translationese while improving translation naturalness, validated by human evaluations and automatic metrics. Our findings highlight the need for training-aware adjustments to optimize LLM translation outputs, paving the way for more fluent and target-language-consistent translations. We release the data and code at https://github.com/yafuly/LLM_Translationese.