DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning

Zhihong Shao, Yuxiang Luo, Chengda Lu, Z. Z. Ren, Jiewen Hu, Tian Ye, Zhibin Gou, Shirong Ma, Xiaokang Zhang

2025-12-01

DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning

Summary

This paper focuses on improving how well large language models (LLMs) can do mathematical reasoning, specifically things like proving theorems. It shows how to build a system that not only *gets* the right answers but also *shows its work* in a logically sound way.

What's the problem?

While LLMs have gotten much better at solving math problems and even competing in math contests, simply rewarding them for correct final answers isn't enough. A correct answer doesn't mean the reasoning used to get there was actually valid. Many advanced math problems require a detailed, step-by-step proof, and a final answer reward doesn't help with that. The biggest challenge is making sure the LLM's reasoning is both accurate and complete, especially when dealing with problems that don't have known solutions.

What's the solution?

The researchers tackled this by creating a system with two main parts: a 'verifier' and a 'generator'. The verifier is an LLM trained to check the correctness of mathematical proofs. The generator creates the proofs, but it's encouraged to use the verifier to find and fix any errors *before* submitting its final answer. To keep the verifier effective as the generator gets smarter, they automatically created new, difficult-to-verify proofs and used those to further train the verifier. This creates a cycle of improvement for both parts of the system, resulting in a model called DeepSeekMath-V2.

Why it matters?

This work is important because it moves beyond just getting the right answer in math and focuses on the *process* of reasoning. This is crucial for advancing AI in fields like scientific research where understanding *how* a conclusion was reached is just as important as the conclusion itself. The ability for an AI to self-verify its reasoning is also key to tackling complex, open-ended problems where there isn't a pre-existing solution to check against.

Abstract

Large language models have made significant progress in mathematical reasoning, which serves as an important testbed for AI and could impact scientific research if further advanced. By scaling reasoning with reinforcement learning that rewards correct final answers, LLMs have improved from poor performance to saturating quantitative reasoning competitions like AIME and HMMT in one year. However, this approach faces fundamental limitations. Pursuing higher final answer accuracy doesn't address a key issue: correct answers don't guarantee correct reasoning. Moreover, many mathematical tasks like theorem proving require rigorous step-by-step derivation rather than numerical answers, making final answer rewards inapplicable. To push the limits of deep reasoning, we believe it is necessary to verify the comprehensiveness and rigor of mathematical reasoning. Self-verification is particularly important for scaling test-time compute, especially for open problems without known solutions. Towards self-verifiable mathematical reasoning, we investigate how to train an accurate and faithful LLM-based verifier for theorem proving. We then train a proof generator using the verifier as the reward model, and incentivize the generator to identify and resolve as many issues as possible in their own proofs before finalizing them. To maintain the generation-verification gap as the generator becomes stronger, we propose to scale verification compute to automatically label new hard-to-verify proofs, creating training data to further improve the verifier. Our resulting model, DeepSeekMath-V2, demonstrates strong theorem-proving capabilities, achieving gold-level scores on IMO 2025 and CMO 2024 and a near-perfect 118/120 on Putnam 2024 with scaled test-time compute.

View Paper