BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation

Bo Pang, Hanze Dong, Jiacheng Xu, Silvio Savarese, Yingbo Zhou, Caiming Xiong

2025-02-07

BOLT: Bootstrap Long Chain-of-Thought in Language Models without
Distillation

Summary

This paper talks about BOLT, a new method for teaching AI language models to think through complex problems step-by-step, without needing to copy from more advanced models.

What's the problem?

Current methods for improving AI reasoning often rely on copying from advanced models like OpenAI's o1, which limits progress and doesn't work well for all types of problems. These methods also tend to focus mostly on math problems, which doesn't help the AI learn to reason about other subjects.

What's the solution?

The researchers developed BOLT, a three-step process that helps AI models learn to think through problems on their own. First, they use a few examples to teach the AI how to break down problems. Then, they fine-tune the AI on these examples. Finally, they use online training to help the AI get even better at reasoning. This method works with different sizes of AI models and doesn't need expensive human-created examples.

Why it matters?

This research matters because it shows a way to make AI smarter and better at solving complex problems without relying on copying from other advanced AIs. This could lead to more capable and independent AI systems that can reason about a wide range of subjects, not just math. It also means we might be able to create smarter AIs more easily and cheaply in the future.

Abstract

Large language models (LLMs), such as o1 from OpenAI, have demonstrated remarkable reasoning capabilities. o1 generates a long chain-of-thought (LongCoT) before answering a question. LongCoT allows LLMs to analyze problems, devise plans, reflect, and backtrack effectively. These actions empower LLM to solve complex problems. After the release of o1, many teams have attempted to replicate its LongCoT and reasoning capabilities. In terms of methods, they primarily rely on knowledge distillation with data from existing models with LongCoT capacities (e.g., OpenAI-o1, Qwen-QwQ, DeepSeek-R1-Preview), leaving significant uncertainties on systematically developing such reasoning abilities. In terms of data domains, these works focus narrowly on math while a few others include coding, limiting their generalizability. This paper introduces a novel approach to enable LLM's LongCoT capacity without distillation from o1-like models or expensive human annotations, where we bootstrap LongCoT (BOLT) from a standard instruct model. BOLT involves three stages: 1) LongCoT data bootstrapping with in-context learning on a standard instruct model; 2) LongCoT supervised finetuning; 3) online training to further refine LongCoT capacities. In BOLT, only a few in-context examples need to be constructed during the bootstrapping stage; in our experiments, we created 10 examples, demonstrating the feasibility of this approach. We use Llama-3.1-70B-Instruct to bootstrap LongCoT and apply our method to various model scales (7B, 8B, 70B). We achieve impressive performance on a variety of benchmarks, Arena-Hard, MT-Bench, WildBench, ZebraLogic, MATH500, which evaluate diverse task-solving and reasoning capabilities.

View Paper