Self-Improving Multilingual Long Reasoning via Translation-Reasoning Integrated Training

Junxiao Liu, Zhijun Wang, Yixiao Li, Zhejian Lai, Liqian Huang, Xin Huang, Xue Han, Junlan Feng, Shujian Huang

2026-02-09

Self-Improving Multilingual Long Reasoning via Translation-Reasoning Integrated Training

Summary

This paper investigates why large language models, designed for complex thinking, often stumble when asked questions in languages other than English, specifically in tasks requiring mathematical reasoning.

What's the problem?

When these models try to solve problems in a language that isn't English, they frequently default to thinking in English first, then translating the answer. Or, if forced to reason directly in the question's language, their accuracy drops significantly. This happens because the models aren't very good at both understanding the question *in* another language and then performing the reasoning steps *in* that language.

What's the solution?

The researchers developed a training method called TRIT, which stands for Translation-Reasoning Integrated Training. TRIT cleverly combines the process of learning to translate with the process of learning to reason. The model learns to translate as part of solving the problem, without needing extra translated examples or feedback. Essentially, it gets better at both understanding the question and generating the correct answer in the original language simultaneously.

Why it matters?

This work is important because it improves the ability of AI models to work with information in many different languages. This is crucial for making these powerful tools accessible to a wider global audience and for solving problems that require understanding information from diverse sources. The improvements in both reasoning accuracy and translation quality demonstrate a significant step forward in building truly multilingual AI.

Abstract

Long reasoning models often struggle in multilingual settings: they tend to reason in English for non-English questions; when constrained to reasoning in the question language, accuracies drop substantially. The struggle is caused by the limited abilities for both multilingual question understanding and multilingual reasoning. To address both problems, we propose TRIT (Translation-Reasoning Integrated Training), a self-improving framework that integrates the training of translation into multilingual reasoning. Without external feedback or additional multilingual data, our method jointly enhances multilingual question understanding and response generation. On MMATH, our method outperforms multiple baselines by an average of 7 percentage points, improving both answer correctness and language consistency. Further analysis reveals that integrating translation training improves cross-lingual question alignment by over 10 percentage points and enhances translation quality for both mathematical questions and general-domain text, with gains up to 8.4 COMET points on FLORES-200.

View Paper