Beyond English: Toward Inclusive and Scalable Multilingual Machine Translation with LLMs
Yingfeng Luo, Ziqiang Xu, Yuxuan Ouyang, Murun Yang, Dingyang Lin, Kaiyan Chang, Tong Zheng, Bei Li, Peinan Feng, Quan Du, Tong Xiao, Jingbo Zhu
2025-11-12
Summary
This paper introduces LMT, a new set of large language models designed for translating between many different languages—specifically 60 of them, in 234 different translation pairings, with a focus on both Chinese and English.
What's the problem?
Current multilingual translation models, while good, still struggle with consistently high quality across all languages and often perform best when translating *to* English or Chinese. The researchers found a surprising issue: when training these models with data going back and forth between languages, the model started to prioritize translating *to* English or Chinese too much, hurting its ability to translate *from* those languages or between other language pairs. This is called 'directional degeneration'.
What's the solution?
To fix this, the researchers used two main strategies. First, 'Strategic Downsampling' means they carefully reduced the amount of data used for translating *to* English and Chinese during training, preventing the model from becoming overly focused on those directions. Second, they used 'Parallel Multilingual Prompting', which means they helped the model learn by showing it how similar languages handle translations, improving its ability to transfer knowledge between languages. They also put a lot of effort into carefully selecting and preparing the data used for training.
Why it matters?
This work is important because the LMT models achieve state-of-the-art translation quality, even outperforming much larger models like Aya-101 and NLLB. They’ve also released the models in different sizes (0.6B, 1.7B, 4B, and 8B parameters) so other researchers can easily use and build upon their work, ultimately leading to better and more inclusive translation technology for a wider range of languages.
Abstract
Large language models have significantly advanced Multilingual Machine Translation (MMT), yet the broad language coverage, consistent translation quality, and English-centric bias remain open challenges. To address these challenges, we introduce LMT, a suite of Large-scale Multilingual Translation models centered on both Chinese and English, covering 60 languages and 234 translation directions. During development, we identify a previously overlooked phenomenon of directional degeneration, where symmetric multi-way fine-tuning data overemphasize reverse directions (X to En/Zh), leading to excessive many-to-one mappings and degraded translation quality. We propose Strategic Downsampling, a simple yet effective method to mitigate this degeneration. In addition, we design Parallel Multilingual Prompting (PMP), which leverages typologically related auxiliary languages to enhance cross-lingual transfer. Through rigorous data curation and refined adaptation strategies, LMT achieves SOTA performance among models of comparable language coverage, with our 4B model (LMT-60-4B) surpassing the much larger Aya-101-13B and NLLB-54B models by a substantial margin. We release LMT in four sizes (0.6B/1.7B/4B/8B) to catalyze future research and provide strong baselines for inclusive, scalable, and high-quality MMT \href{https://github.com/NiuTrans/LMT{https://github.com/NiuTrans/LMT}}.