R1-T1: Fully Incentivizing Translation Capability in LLMs via Reasoning Learning
Minggui He, Yilun Liu, Shimin Tao, Yuanchang Luo, Hongyong Zeng, Chang Su, Li Zhang, Hongxia Ma, Daimeng Wei, Weibin Meng, Hao Yang, Boxing Chen, Osamu Yoshie
2025-02-28
Summary
This paper talks about R1-T1, a new way to make AI language models better at translating between different languages by teaching them to think more like human translators. It's like giving the AI a translator's brain to help it understand and translate more accurately across many languages and types of text.
What's the problem?
Current AI translation systems don't use the same kind of step-by-step thinking that human translators use. They either have a fixed way of thinking that only works for specific types of translation, or they try to come up with their own way of thinking that doesn't match how humans actually translate. This makes it hard for AI to handle different kinds of translations well, especially for languages or types of text it hasn't seen before.
What's the solution?
The researchers created R1-T1, which uses a method called reinforcement learning to teach AI how to think through translations like a human would. They came up with six different patterns of thinking that match how real translators work, and taught the AI to use these patterns. R1-T1 can handle translations for many languages and different types of text, like legal documents or medical information. It can also figure out new ways to think through translations on its own, and it doesn't forget how to do general translations while learning new, specific ones.
Why it matters?
This matters because it could make AI translators much better at handling all kinds of translations, even for languages they weren't specifically trained on. This could help break down language barriers in many areas, from business and science to everyday communication between people who speak different languages. It's a big step towards making AI translation as good as, or even better than, human translation across many languages and types of text.
Abstract
Despite recent breakthroughs in reasoning-enhanced large language models (LLMs) like DeepSeek-R1, incorporating inference-time reasoning into machine translation (MT), where human translators naturally employ structured, multi-layered reasoning chain-of-thoughts (CoTs), is yet underexplored. Existing methods either design a fixed CoT tailored for a specific MT sub-task (e.g., literature translation), or rely on synthesizing CoTs unaligned with humans and supervised fine-tuning (SFT) prone to catastrophic forgetting, limiting their adaptability to diverse translation scenarios. This paper introduces R1-Translator (R1-T1), a novel framework to achieve inference-time reasoning for general MT via reinforcement learning (RL) with human-aligned CoTs comprising six common patterns. Our approach pioneers three innovations: (1) extending reasoning-based translation beyond MT sub-tasks to six languages and diverse tasks (e.g., legal/medical domain adaptation, idiom resolution); (2) formalizing six expert-curated CoT templates that mirror hybrid human strategies like context-aware paraphrasing and back translation; and (3) enabling self-evolving CoT discovery and anti-forgetting adaptation through RL with KL-constrained rewards. Experimental results indicate a steady translation performance improvement in 21 languages and 80 translation directions on Flores-101 test set, especially on the 15 languages unseen from training, with its general multilingual abilities preserved compared with plain SFT.