GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving

Ruida Wang, Jiarui Yao, Rui Pan, Shizhe Diao, Tong Zhang

2025-10-14

GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving

Summary

This paper introduces a new way to train computer programs to automatically solve complex math problems, specifically those written in a formal language called Lean.

What's the problem?

Currently, training these problem-solving programs is difficult and expensive. Existing methods use either a lot of trial and error, which takes time and resources, or rely on a fixed set of problems. This limits the program's ability to learn and tackle truly challenging mathematical proofs because it only sees a limited range of problem types and difficulties.

What's the solution?

The researchers developed a system called GAR, which stands for Generative Adversarial Reinforcement learning. Think of it like having two programs working together: one generates math problems, and the other tries to solve them. They compete against each other, with the problem generator creating increasingly difficult problems as the solver gets better. This creates a constantly evolving curriculum that helps the solver learn more efficiently and effectively. It's like a student being given problems that are just the right level of challenge to help them improve.

Why it matters?

This work is important because it improves the performance of these automated math solvers, allowing them to prove more complex theorems. Beyond just math, the GAR approach provides a general framework for training programs to solve problems in any verifiable environment, meaning any situation where you can automatically check if an answer is correct. This could have applications in areas like software verification and artificial intelligence.

Abstract

Solving math problems through verifiable languages such as Lean has significantly impacted both the mathematics and computer science communities. Current state-of-the-art models are often trained with expensive online Reinforcement Learning (RL) or expert iteration. However, these approaches rely on fixed problem sets, which causes inefficient training and limits the model to tackle complex problems. To overcome these limitations, we propose GAR: Generative Adversarial Reinforcement learning, a comprehensive RL training framework that jointly trains the problem composer and solver in an adversarial loop. GAR introduces an implicit curriculum learning mechanism, which aligns task difficulty with the prover's evolving capability. It thereby improves the training efficiency and enables stronger performance of proving advanced theorems. Experiments show that with GAR training, Goedel-Prover-V2-8B and DeepSeek-Prover-V2-7B achieve an average relative improvement in pass@32 of 4.20% on MiniF2F-Test benchmark, while DeepSeek-Prover-V2's pass@32 on ProofNet-Test increases from 22.58% to 25.81%. Beyond formal proving, GAR establishes a general RL paradigm for co-evolution of problem generation and solving under verifiable environments.

View Paper