zELO: ELO-inspired Training Method for Rerankers and Embedding Models
Nicholas Pipitone, Ghita Houir Alami, Advaith Avadhanam, Anton Kaminskyi, Ashley Khoo
2025-09-17
Summary
This paper introduces a new way to train computer systems to be better at finding the most relevant information, called zELO, and then uses this method to create two powerful, freely available reranking models named zerank-1 and zerank-1-small.
What's the problem?
When you search for something online, the system needs to rank results from most to least relevant. Traditional methods for training these ranking systems can be complex and often require a lot of labeled data, which is expensive and time-consuming to create. Existing, high-performing systems are often 'closed-source,' meaning you can't see or modify how they work.
What's the solution?
The researchers realized that ranking problems are fundamentally similar to a statistical model called a Thurstone model, which is used in psychology to understand preferences. They developed zELO, a training method based on this connection, allowing them to train their rerankers using a large amount of *unlabeled* data – just raw text from queries and documents. They trained zerank-1 and zerank-1-small using 112,000 search queries and a large collection of documents, and the whole process took less than 10,000 hours on powerful computers.
Why it matters?
These new models, zerank-1 and zerank-1-small, are better than many existing ranking systems, even those that are proprietary and not publicly available, across a variety of fields like finance, law, computer code, and science. Importantly, because they are 'open-weight,' anyone can use, study, and improve them. They also work well even when given search tasks they haven't specifically been trained on, making them very versatile.
Abstract
We introduce a novel training methodology named zELO, which optimizes retrieval performance via the analysis that ranking tasks are statically equivalent to a Thurstone model. Based on the zELO method, we use unsupervised data in order train a suite of state-of-the-art open-weight reranker models: zerank-1 and zerank-1-small. These models achieve the highest retrieval scores in multiple domains, including finance, legal, code, and STEM, outperforming closed-source proprietary rerankers on both NDCG@10 and Recall. These models also demonstrate great versatility, maintaining their 0-shot performance on out-of-domain and private customer datasets. The training data included 112,000 queries and 100 documents per query, and was trained end-to-end from unannotated queries and documents in less than 10,000 H100-hours.