Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B

Sen Xu, Yi Zhou, Wei Wang, Jixin Min, Zhibin Yin, Yingwei Dai, Shixi Liu, Lianyu Pang, Yirong Chen, Junlin Zhang

2025-11-12

Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B

Summary

This paper introduces VibeThinker-1.5B, a surprisingly capable AI model with only 1.5 billion parameters, and shows it can perform reasoning tasks as well as much larger models.

What's the problem?

The trend in AI has been to simply make models bigger and bigger, like DeepSeek R1 and Kimi k2, assuming that more parameters automatically mean better performance. This is expensive and limits who can do advanced AI research. The core issue is whether smaller models can truly achieve strong reasoning abilities without needing massive size.

What's the solution?

The researchers developed VibeThinker-1.5B using a new approach called the Spectrum-to-Signal Principle (SSP). This involves first generating many different possible solutions to a problem, then using a technique to highlight and strengthen the correct answer. They trained this model for a relatively low cost of $7,800. Essentially, they focused on *how* the model learns, not just *how much* data it processes.

Why it matters?

This work is important because it demonstrates that you don't necessarily need a huge, expensive model to get good results in reasoning tasks. VibeThinker-1.5B outperforms larger models on certain benchmarks, meaning smaller models can be just as effective, making advanced AI research more accessible and affordable for everyone.

Abstract

Challenging the prevailing consensus that small models inherently lack robust reasoning, this report introduces VibeThinker-1.5B, a 1.5B-parameter dense model developed via our Spectrum-to-Signal Principle (SSP). This challenges the prevailing approach of scaling model parameters to enhance capabilities, as seen in models like DeepSeek R1 (671B) and Kimi k2 (>1T). The SSP framework first employs a Two-Stage Diversity-Exploring Distillation (SFT) to generate a broad spectrum of solutions, followed by MaxEnt-Guided Policy Optimization (RL) to amplify the correct signal. With a total training cost of only $7,800, VibeThinker-1.5B demonstrates superior reasoning capabilities compared to closed-source models like Magistral Medium and Claude Opus 4, and performs on par with open-source models like GPT OSS-20B Medium. Remarkably, it surpasses the 400x larger DeepSeek R1 on three math benchmarks: AIME24 (80.3 vs. 79.8), AIME25 (74.4 vs. 70.0), and HMMT25 (50.4 vs. 41.7). This is a substantial improvement over its base model (6.7, 4.3, and 0.6, respectively). On LiveCodeBench V6, it scores 51.1, outperforming Magistral Medium's 50.3 and its base model's 0.0. These findings demonstrate that small models can achieve reasoning capabilities comparable to large models, drastically reducing training and inference costs and thereby democratizing advanced AI research.

View Paper