Bourbaki: Self-Generated and Goal-Conditioned MDPs for Theorem Proving

Matthieu Zimmer, Xiaotong Ji, Rasul Tutunov, Anthony Bordg, Jun Wang, Haitham Bou Ammar

2025-07-04

Bourbaki: Self-Generated and Goal-Conditioned MDPs for Theorem Proving

Summary

This paper talks about Bourbaki, a new system that improves automated theorem proving by breaking down big proof problems into smaller subgoals and using Monte Carlo Tree Search to explore these steps efficiently. It focuses on creating and working on subgoals to solve complex math problems more effectively.

What's the problem?

The problem is that automated theorem proving is very difficult because proofs can be huge and complicated, so AI systems often struggle to find the correct sequence of logical steps quickly and accurately.

What's the solution?

The researchers designed Bourbaki to generate goal-conditioned Markov Decision Processes, meaning the system sets smaller goals within the big problem and uses a powerful tree search method to explore multiple proof paths efficiently. By focusing on subgoals and organizing the search better, it achieves state-of-the-art results on a challenging benchmark called PutnamBench.

Why it matters?

This matters because better automated theorem proving speeds up discovering and verifying mathematical truths, which can help advance science, engineering, and AI research by making complex reasoning tasks easier for computers.

Abstract

A new framework using self-generated goal-conditioned MDPs and Monte Carlo Tree Search improves automated theorem proving by generating and pursuing subgoals, achieving state-of-the-art results on PutnamBench.

View Paper