Information-Preserving Reformulation of Reasoning Traces for Antidistillation
Jiayu Ding, Lei Cui, Li Dong, Nanning Zheng, Furu Wei
2025-10-15
Summary
This paper focuses on a problem with how large language models, or LLMs, explain their reasoning. While detailed explanations are helpful for people to understand *how* an LLM arrived at an answer, they also make it easier for others to copy the model's knowledge. The paper introduces a method called PART to protect the model's reasoning without sacrificing the helpful explanations.
What's the problem?
LLMs are getting better at complex tasks when they show their step-by-step reasoning. However, these detailed reasoning chains can be easily used to 'train' smaller, copycat models – a process called distillation. Companies that create these powerful LLMs want to protect their work, so they often shorten the explanations, but this makes the reasoning less useful for people. It's a trade-off between security and usability.
What's the solution?
The researchers developed PART, which reformats the reasoning traces in two main ways. First, it removes instances where the model seems to be 'talking to itself' – unnecessary steps that don't contribute to the final answer. Second, it rearranges the order of smaller conclusions within the reasoning to make it harder for a student model to learn the original model’s patterns. They used a small, separate model to do this reformatting, so it doesn't require a lot of extra computing power.
Why it matters?
This work is important because it offers a way to protect the intellectual property of LLM developers *without* making the models less transparent to users. By disrupting the distillation process, PART helps ensure that the benefits of detailed reasoning aren't exploited for unauthorized copying, while still allowing people to understand and learn from the model’s thought process. Experiments showed a significant drop in performance when trying to copy the model after using PART.
Abstract
Recent advances in Large Language Models (LLMs) show that extending the length of reasoning chains significantly improves performance on complex tasks. While revealing these reasoning traces helps users better follow, verify, and learn from the model's problem-solving process, it also makes them highly vulnerable to unauthorized distillation. To mitigate this risk, proprietary model providers often adopt aggressive protection strategies, such as replacing detailed reasoning with brief summaries, which deprive users of valuable intermediate information. To address this trade-off, we propose PART, an information-preserving antidistillation reformulation of reasoning traces. Motivated by the difference between how humans understand reasoning traces and how LLMs exploit them for supervised fine-tuning, we design a simple but effective two-step reformulation: removing self-talk behaviors and reordering sub-conclusions. A small auxiliary model is trained to perform this reformulation, incurring minimal computational overhead. Extensive experiments demonstrate that PART consistently disrupts distillation across student models of different sizes and types on various reasoning benchmarks. For instance, when training on reformulated traces, even the performance of a large 32B student model decreases from 54.17 to 46.88 on AIME 2024, corresponding to a 13.5% degradation.