A Rising Tide Lifts All Boats: MTQE Rewards for Idioms Improve General Translation Quality

Ishika Agarwal, Zhenlin He, Dhruva Patil, Dilek Hakkani-Tür

2026-01-13

A Rising Tide Lifts All Boats: MTQE Rewards for Idioms Improve General Translation Quality

Summary

This research focuses on how well machine translation systems handle expressions like idioms and metaphors, which don't mean exactly what their individual words suggest.

What's the problem?

Current machine translation models struggle with non-compositional language – phrases where the overall meaning isn't simply the sum of the parts. Things like idioms, proverbs, and metaphors are deeply rooted in culture and have both literal and figurative meanings, making them hard for computers to translate accurately because models are typically trained on straightforward, literal text.

What's the solution?

The researchers used a technique called GRPO-style fine-tuning, which essentially trains the translation model to get better at translating idioms. They did this by using another model, called a Machine Translation Quality Estimation (MTQE) model, to give the translation model 'rewards' based on how good its idiom translations were. They tested this approach with Chinese and Hindi idioms and found significant improvements in translation quality.

Why it matters?

This work shows just how much harder it is to translate figurative language compared to regular text, and it provides a way to improve machine translation's ability to understand and translate across cultures. Better handling of idioms and metaphors means more accurate and nuanced translations, and ultimately, better communication between people who speak different languages.

Abstract

Non-compositional expressions (e.g., idioms, proverbs, and metaphors) pose significant challenges for neural machine translation systems because their meanings cannot be derived from individual words alone. These expressions encode rich, cultural meaning, and have both figurative and literal meanings, making accurate translation difficult. Because models are fairly good at translating compositional text, we investigate GRPO-style fine-tuning using Machine Translation Quality Estimation (MTQE) models as reward functions to train models to better translate idioms. Using Chinese and Hindi idiom datasets, we find that idiom translation abilities improve by ~14 points, general, non-idiomatic translation implicitly improves by ~8 points, and cross-lingual translation abilities (trained on one language, evaluated on another) improves by ~6 points. Overall, our work quantifies the non-compositional translation gap and offers insights for developing LLMs with stronger cross-cultural and figurative language understanding.

View Paper