Leanabell-Prover: Posttraining Scaling in Formal Reasoning
Jingyuan Zhang, Qi Wang, Xingguang Ji, Yahui Liu, Yang Yue, Fuzheng Zhang, Di Zhang, Guorui Zhou, Kun Gai
2025-04-09
Summary
This paper talks about Leanabell-Prover, an AI tool that helps computers solve complex math proofs by learning from examples and getting feedback on its attempts, like a student practicing with a teacher checking their work.
What's the problem?
Current AI systems that solve math proofs are not as good as newer AI models used for language tasks, and they struggle to handle really tricky proofs without lots of extra training.
What's the solution?
Leanabell-Prover trains existing proof-solving AI using a mix of real math problems and specially designed practice questions, then uses a reward system where the AI gets points for correct proofs and learns from mistakes.
Why it matters?
This helps create AI tools that can solve advanced math problems faster and more accurately, like helping mathematicians with research or checking if computer code is logically correct.
Abstract
Recent advances in automated theorem proving (ATP) through LLMs have highlighted the potential of formal reasoning with Lean 4 codes. However, ATP has not yet be revolutionized by the recent posttraining scaling as demonstrated by Open AI O1/O3 and Deepseek R1. In this work, we investigate the entire posttraining of ATP, aiming to align it with breakthroughs in reasoning models in natural languages.To begin, we continual train current ATP models with a hybrid dataset, which consists of numerous statement-proof pairs, and additional data aimed at incorporating cognitive behaviors that emulate human reasoning and hypothesis refinement. Next, we explore reinforcement learning with the use of outcome reward returned by Lean 4 compiler. Through our designed continual training and reinforcement learning processes, we have successfully improved existing formal provers, including both DeepSeek-Prover-v1.5 and Goedel-Prover, achieving state-of-the-art performance in the field of whole-proof generation. For example, we achieve a 59.8% pass rate (pass@32) on MiniF2F. This is an on-going project and we will progressively update our findings, release our data and training details.