O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?

Zhen Huang, Haoyang Zou, Xuefeng Li, Yixiu Liu, Yuxiang Zheng, Ethan Chern, Shijie Xia, Yiwei Qin, Weizhe Yuan, Pengfei Liu

2024-11-26

O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?

Summary

This paper discusses the journey of replicating OpenAI's O1 model, focusing on how knowledge distillation techniques can enhance performance in complex tasks, particularly in mathematical reasoning.

What's the problem?

Replicating advanced AI models like OpenAI's O1 can be challenging due to the complexity of the tasks they perform. Many existing methods for replication are not transparent and often rely on complicated techniques that may not be effective. This makes it hard to understand how to achieve similar performance without extensive resources or knowledge.

What's the solution?

The authors introduce a method called knowledge distillation, which involves training a simpler model (the student) to learn from a more complex model (the teacher). They show that by distilling knowledge from O1's API and using supervised fine-tuning, they can create a model that performs better on difficult math problems with significantly less technical complexity. Their experiments demonstrate that this approach allows for strong performance across various tasks, even when the model is primarily trained on mathematical data.

Why it matters?

This research is important because it highlights a more efficient way to replicate powerful AI models while promoting transparency in the field. By sharing their findings and methods, the authors encourage other researchers to adopt similar practices, which can lead to better understanding and advancements in AI technology. Additionally, it emphasizes the need for researchers to focus on foundational principles rather than shortcuts, ensuring that future AI systems are built on solid ground.

Abstract

This paper presents a critical examination of current approaches to replicating OpenAI's O1 model capabilities, with particular focus on the widespread but often undisclosed use of knowledge distillation techniques. While our previous work explored the fundamental technical path to O1 replication, this study reveals how simple distillation from O1's API, combined with supervised fine-tuning, can achieve superior performance on complex mathematical reasoning tasks. Through extensive experiments, we show that a base model fine-tuned on simply tens of thousands of samples O1-distilled long-thought chains outperforms O1-preview on the American Invitational Mathematics Examination (AIME) with minimal technical complexity. Moreover, our investigation extends beyond mathematical reasoning to explore the generalization capabilities of O1-distilled models across diverse tasks: hallucination, safety and open-domain QA. Notably, despite training only on mathematical problem-solving data, our models demonstrated strong generalization to open-ended QA tasks and became significantly less susceptible to sycophancy after fine-tuning. We deliberately make this finding public to promote transparency in AI research and to challenge the current trend of obscured technical claims in the field. Our work includes: (1) A detailed technical exposition of the distillation process and its effectiveness, (2) A comprehensive benchmark framework for evaluating and categorizing O1 replication attempts based on their technical transparency and reproducibility, (3) A critical discussion of the limitations and potential risks of over-relying on distillation approaches, our analysis culminates in a crucial bitter lesson: while the pursuit of more capable AI systems is important, the development of researchers grounded in first-principles thinking is paramount.

View Paper