Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving

Yong Lin, Shange Tang, Bohan Lyu, Jiayun Wu, Hongzhou Lin, Kaiyu Yang, Jia Li, Mengzhou Xia, Danqi Chen, Sanjeev Arora, Chi Jin

2025-02-12

Goedel-Prover: A Frontier Model for Open-Source Automated Theorem
Proving

Summary

This paper talks about CAD-Editor, a new way to edit 3D computer designs (CAD models) using simple text instructions. It's like having a smart assistant that can understand and apply changes to complex 3D models just by reading your written directions.

What's the problem?

Currently, editing 3D computer designs is complicated and requires special skills. Existing tools either can't use text instructions to make specific changes, or they ignore the original design when making new ones. Also, it's hard to get enough good examples to teach AI systems how to do this task well.

What's the solution?

The researchers created CAD-Editor, which works in two main steps. First, it figures out which part of the 3D model needs to be changed based on the text instructions. Then, it makes those specific changes. To teach their AI system, they came up with a clever way to create lots of practice examples automatically. They use other AI tools to generate pairs of original and edited 3D models, and then describe the differences between them in words.

Why it matters?

This matters because it could make 3D design much easier and more accessible to everyone. Instead of needing to learn complex 3D software, people could just type what they want to change, and the AI would do it for them. This could speed up design processes in many industries, from product design to architecture, and allow more people to participate in 3D modeling without extensive training.

Abstract

We introduce Goedel-Prover, an open-source large language model (LLM) that achieves the state-of-the-art (SOTA) performance in automated formal proof generation for mathematical problems. The key challenge in this field is the scarcity of formalized math statements and proofs, which we tackle in the following ways. We train statement formalizers to translate the natural language math problems from Numina into formal language (Lean 4), creating a dataset of 1.64 million formal statements. LLMs are used to check that the formal statements accurately preserve the content of the original natural language problems. We then iteratively build a large dataset of formal proofs by training a series of provers. Each prover succeeds in proving many statements that the previous ones could not, and these new proofs are added to the training set for the next prover. The final prover outperforms all existing open-source models in whole-proof generation. On the miniF2F benchmark, it achieves a 57.6% success rate (Pass@32), exceeding the previous best open-source model by 7.6%. On PutnamBench, Goedel-Prover successfully solves 7 problems (Pass@512), ranking first on the leaderboard. Furthermore, it generates 29.7K formal proofs for Lean Workbook problems, nearly doubling the 15.7K produced by earlier works.

View Paper