Efficiency-Effectiveness Reranking FLOPs for LLM-based Rerankers
Zhiyuan Peng, Ting-ruen Wei, Tingyu Song, Yilun Zhao, Yi Fang
2025-07-09
Summary
This paper talks about a new way to measure how efficient and effective large language models (LLMs) are when they are used to reorder search results, called reranking. The method uses special metrics based on the amount of computer work (measured in FLOPs) needed to get good rankings and process many queries.
What's the problem?
The problem is that current ways to evaluate rerankers look at things like speed or the number of steps but depend too much on the specific hardware or setup used, making it hard to compare fairly or understand the true trade-off between how good the results are and how much computing power they use.
What's the solution?
The researchers created E²R-FLOPs, a set of hardware-independent metrics that measure relevance per compute and the number of queries processed per compute unit to better understand this trade-off. They also developed a simple way to estimate the computing cost of rerankers without running them and tested many rerankers to highlight their efficiency and effectiveness balance.
Why it matters?
This matters because having clear, fair, and universal ways to measure reranker performance helps researchers and developers build better and more efficient search systems that deliver high-quality results while using less computing power.
Abstract
E\textsuperscript{2}R-FLOPs metrics, including RPP and QPP, provide a hardware-agnostic evaluation of LLM-based rerankers' efficiency and effectiveness.