Z1: Efficient Test-time Scaling with Code

Zhaojian Yu, Yinghao Wu, Yilun Zhao, Arman Cohan, Xiao-Ping Zhang

2025-04-02

Z1: Efficient Test-time Scaling with Code

Summary

This paper introduces a way to make AI models better at solving complex problems using code, by teaching them to think more efficiently.

What's the problem?

AI models often use too much 'thinking' (reasoning) when solving problems, which takes up computing power and time.

What's the solution?

The researchers trained AI models on coding problems, showing them both short and long ways to solve them, and then created a 'Shifted Thinking Window' to limit unnecessary reasoning.

Why it matters?

This work matters because it can make AI models faster and more efficient at problem-solving, while still maintaining accuracy.

Abstract

Large Language Models (LLMs) can achieve enhanced complex problem-solving through test-time computing scaling, yet this often entails longer contexts and numerous reasoning token costs. In this paper, we propose an efficient test-time scaling method that trains LLMs on code-related reasoning trajectories, facilitating their reduction of excess thinking tokens while maintaining performance. First, we create Z1-Code-Reasoning-107K, a curated dataset of simple and complex coding problems paired with their short and long solution trajectories. Second, we present a novel Shifted Thinking Window to mitigate overthinking overhead by removing context-delimiting tags (e.g., <think>. . . </think>) and capping reasoning tokens. Trained with long and short trajectory data and equipped with Shifted Thinking Window, our model, Z1-7B, demonstrates the ability to adjust its reasoning level as the complexity of problems and exhibits efficient test-time scaling across different reasoning tasks that matches R1-Distill-Qwen-7B performance with about 30% of its average thinking tokens. Notably, fine-tuned with only code trajectories, Z1-7B demonstrates generalization to broader reasoning tasks (47.5% on GPQA Diamond). Our analysis of efficient reasoning elicitation also provides valuable insights for future research.

View Paper