LimRank: Less is More for Reasoning-Intensive Information Reranking

Tingyu Song, Yilun Zhao, Siyue Zhang, Chen Zhao, Arman Cohan

2025-10-28

LimRank: Less is More for Reasoning-Intensive Information Reranking

Summary

This paper focuses on making large language models (LLMs) better at re-ordering search results to show the most relevant information first, a task called information reranking.

What's the problem?

Currently, adapting LLMs for reranking requires a lot of computing power because it involves 'fine-tuning' them with massive amounts of data. This is expensive and time-consuming, making it hard for many researchers to work with these powerful models for this specific task.

What's the solution?

The researchers created a system called LIMRANK-SYNTHESIZER that automatically generates realistic and challenging reranking examples. They then used this generated data to fine-tune a new reranker model, LIMRANK. This approach allows LIMRANK to perform well using significantly less training data – less than 5% of what previous methods needed.

Why it matters?

This work is important because it shows that you don't need huge amounts of data and computing resources to get good performance from LLMs for reranking. This makes the technology more accessible and opens the door to applying it to more areas, like finding relevant scientific papers or improving the accuracy of systems that use information retrieval to answer questions.

Abstract

Existing approaches typically rely on large-scale fine-tuning to adapt LLMs for information reranking tasks, which is computationally expensive. In this work, we demonstrate that modern LLMs can be effectively adapted using only minimal, high-quality supervision. To enable this, we design LIMRANK-SYNTHESIZER, a reusable and open-source pipeline for generating diverse, challenging, and realistic reranking examples. Using this synthetic data, we fine-tune our reranker model, LIMRANK. We evaluate LIMRANK on two challenging benchmarks, i.e., BRIGHT for reasoning-intensive retrieval and FollowIR for instruction-following retrieval. Our experiments demonstrate that LIMRANK achieves competitive performance, while being trained on less than 5% of the data typically used in prior work. Further ablation studies demonstrate the effectiveness of LIMRANK-SYNTHESIZER and the strong generalization capabilities of LIMRANK across downstream tasks, including scientific literature search and retrieval-augmented generation for knowledge-intensive problem solving.

View Paper