Reverse-Engineered Reasoning for Open-Ended Generation
Haozhe Wang, Haoran Que, Qixin Xu, Minghao Liu, Wangchunshu Zhou, Jiazhan Feng, Wanjun Zhong, Wei Ye, Tong Yang, Wenhao Huang, Ge Zhang, Fangzhen Lin
2025-09-09
Summary
This paper introduces a new method called REER for improving how AI systems perform complex, creative tasks like writing, moving away from traditional approaches that struggle with these kinds of open-ended problems.
What's the problem?
Current AI techniques for 'deep reasoning' – essentially, making a series of logical steps to arrive at an answer – work well in areas like math where you can easily check if the answer is right. However, they don't work as well for creative tasks like writing stories because it's hard to define a clear 'correct' answer or to get a computer to learn from examples without a lot of expensive training data. Existing methods either need perfect feedback signals which are hard to come by in creative tasks, or they require a really good 'teacher' AI which is costly to create and limits how good the student AI can become.
What's the solution?
The researchers developed REER, which works in reverse. Instead of trying to *build* a reasoning process, they start with good solutions (like well-written stories) and work *backwards* to figure out what steps the AI could have taken to create them. This is done without needing a lot of trial and error or a powerful teacher AI. They also created a large dataset of these 'reasoning trajectories' called DeepWriting-20K to help train their model, DeepWriter-8B.
Why it matters?
This research is important because it allows AI to perform creative tasks at a much higher level, even rivaling the performance of some of the best AI models currently available like GPT-4o and Claude 3.5. It opens the door to more sophisticated AI writing tools and potentially other creative applications by providing a more efficient and scalable way to teach AI how to 'think' through complex problems.
Abstract
While the ``deep reasoning'' paradigm has spurred significant advances in verifiable domains like mathematics, its application to open-ended, creative generation remains a critical challenge. The two dominant methods for instilling reasoning -- reinforcement learning (RL) and instruction distillation -- falter in this area; RL struggles with the absence of clear reward signals and high-quality reward models, while distillation is prohibitively expensive and capped by the teacher model's capabilities. To overcome these limitations, we introduce REverse-Engineered Reasoning (REER), a new paradigm that fundamentally shifts the approach. Instead of building a reasoning process ``forwards'' through trial-and-error or imitation, REER works ``backwards'' from known-good solutions to computationally discover the latent, step-by-step deep reasoning process that could have produced them. Using this scalable, gradient-free approach, we curate and open-source DeepWriting-20K, a large-scale dataset of 20,000 deep reasoning trajectories for open-ended tasks. Our model, DeepWriter-8B, trained on this data, not only surpasses strong open-source baselines but also achieves performance competitive with, and at times superior to, leading proprietary models like GPT-4o and Claude 3.5.