DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought

Jiaan Wang, Fandong Meng, Yunlong Liang, Jie Zhou

2024-12-24

DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought

Summary

This paper talks about DRT-o1, a new model designed to improve how machines translate literature by using a method called long chain-of-thought reasoning, which helps the model think deeply about the meaning of phrases, especially those with similes and metaphors.

What's the problem?

Translating literature can be very challenging because it often contains figurative language like similes and metaphors that don't translate directly into other languages. Traditional translation methods can miss the intended meaning, leading to translations that feel awkward or incorrect. Even skilled human translators need to spend a lot of time thinking about how to convey these meanings accurately.

What's the solution?

To tackle this problem, the authors created DRT-o1, which uses a multi-agent framework consisting of three parts: a translator that does the actual translation, an advisor that provides suggestions during the process, and an evaluator that checks if the translation improves with each step. This approach allows the model to think through the translation more thoroughly, resulting in better quality translations. They trained DRT-o1 using a large dataset of sentences with figurative language to enhance its ability to handle complex translations.

Why it matters?

This research is important because it shows how advanced reasoning techniques can significantly improve machine translation, especially for challenging literary texts. By enhancing the way machines understand and translate nuanced language, DRT-o1 can help make literature more accessible across different languages and cultures, improving communication and understanding globally.

Abstract

Recently, O1-like models have emerged as representative examples, illustrating the effectiveness of long chain-of-thought (CoT) in reasoning tasks such as math and coding tasks. In this paper, we introduce DRT-o1, an attempt to bring the success of long CoT to neural machine translation (MT). Specifically, in view of the literature books that might involve similes and metaphors, translating these texts to a target language is very difficult in practice due to cultural differences. In such cases, literal translation often fails to convey the intended meaning effectively. Even for professional human translators, considerable thought must be given to preserving semantics throughout the translation process. To simulate LLMs' long thought ability in MT, we first mine sentences containing similes or metaphors from existing literature books, and then develop a multi-agent framework to translate these sentences via long thought. In the multi-agent framework, a translator is used to iteratively translate the source sentence under the suggestions provided by an advisor. To ensure the effectiveness of the long thoughts, an evaluator is also employed to judge whether the translation in the current round is better than the previous one or not. In this manner, we collect tens of thousands of long-thought MT data, which is used to train our DRT-o1. The experimental results on literature translation demonstrate the effectiveness of the DRT-o1. Using Qwen2.5-7B and Qwen2.5-14B as the backbones, the improvement brought by DRT-o1 achieves 7.33~8.26 BLEU and 1.66~3.36 CometScore. Besides, DRT-o1-7B can outperform QwQ-32B-Preview by 7.82 BLEU and 1.46 CometScore, showing its effectiveness. The project is available at https://github.com/krystalan/DRT-o1

View Paper