Semantic Search over 9 Million Mathematical Theorems
Luke Alexander, Eric Leonen, Sophie Szeto, Artemii Remizov, Ignacio Tejeda, Giovanni Inchiostro, Vasily Ilin
2026-02-06
Summary
This paper focuses on making it easier to find specific mathematical theorems and results. Current search tools usually just give you entire research papers, but mathematicians often need to pinpoint a precise theorem to help with their work or to prove new things.
What's the problem?
Finding specific theorems within the vast amount of mathematical literature is really hard. Existing search methods aren't designed to understand the *meaning* of mathematical statements, so they struggle to deliver exactly what someone is looking for. It's also not well understood how well modern 'semantic search' techniques, which try to understand meaning, work when applied to the complex language of mathematics and a huge collection of theorems.
What's the solution?
The researchers created a large database of over 9 million theorems gathered from various sources like arXiv. They then used natural language descriptions of these theorems and tested different ways to represent them using computer models. They experimented with different models and techniques to see which ones were best at retrieving the correct theorem when given a search query. They also had professional mathematicians write search queries to test the system's accuracy.
Why it matters?
This work shows that it *is* possible to build a system that can effectively search for specific theorems at a large scale. This is important because it could significantly speed up mathematical research, help students learn, and even assist computer programs that try to automatically prove theorems. The tools and dataset they created are publicly available, allowing others to build upon their work.
Abstract
Searching for mathematical results remains difficult: most existing tools retrieve entire papers, while mathematicians and theorem-proving agents often seek a specific theorem, lemma, or proposition that answers a query. While semantic search has seen rapid progress, its behavior on large, highly technical corpora such as research-level mathematical theorems remains poorly understood. In this work, we introduce and study semantic theorem retrieval at scale over a unified corpus of 9.2 million theorem statements extracted from arXiv and seven other sources, representing the largest publicly available corpus of human-authored, research-level theorems. We represent each theorem with a short natural-language description as a retrieval representation and systematically analyze how representation context, language model choice, embedding model, and prompting strategy affect retrieval quality. On a curated evaluation set of theorem-search queries written by professional mathematicians, our approach substantially improves both theorem-level and paper-level retrieval compared to existing baselines, demonstrating that semantic theorem search is feasible and effective at web scale. The theorem search tool is available at https://huggingface.co/spaces/uw-math-ai/theorem-search{this link}, and the dataset is available at https://huggingface.co/datasets/uw-math-ai/TheoremSearch{this link}.