Chain of Draft: Thinking Faster by Writing Less

Silei Xu, Wenhao Xie, Lingxiao Zhao, Pengcheng He

2025-03-03

Chain of Draft: Thinking Faster by Writing Less

Summary

This paper talks about a new way to make AI language models think more efficiently called Chain of Draft (CoD). It's like teaching AI to take quick, smart notes instead of writing long essays when solving problems.

What's the problem?

Current methods for making AI solve complex problems, like Chain-of-Thought (CoT), make the AI write out long, detailed explanations. This takes a lot of time and uses up a lot of computing power, which can be expensive and slow.

What's the solution?

The researchers created Chain of Draft (CoD), which teaches AI to think more like humans do when problem-solving. Instead of writing out every step in detail, CoD makes the AI jot down short, key ideas. This helps the AI focus on the most important parts of the problem without getting bogged down in unnecessary details.

Why it matters?

This matters because it can make AI problem-solving much faster and cheaper without losing accuracy. The researchers found that CoD could solve problems just as well as or better than the longer method, while using only about 8% of the words. This could make AI tools more practical and affordable for real-world use, especially in situations where quick thinking is important.

Abstract

Large Language Models (LLMs) have demonstrated remarkable performance in solving complex reasoning tasks through mechanisms like Chain-of-Thought (CoT) prompting, which emphasizes verbose, step-by-step reasoning. However, humans typically employ a more efficient strategy: drafting concise intermediate thoughts that capture only essential information. In this work, we propose Chain of Draft (<PRE_TAG>CoD)</POST_TAG>, a novel paradigm inspired by human cognitive processes, where LLMs generate minimalistic yet informative intermediate reasoning outputs while solving tasks. By reducing verbosity and focusing on critical insights, CoD matches or surpasses CoT in accuracy while using as little as only 7.6% of the tokens, significantly reducing cost and latency across various reasoning tasks.

View Paper