Less LLM, More Documents: Searching for Improved RAG
Jingjie Ning, Yibo Kong, Yunfan Long, Jamie Callan
2025-10-06
Summary
This paper investigates a way to improve the performance of Retrieval-Augmented Generation (RAG) systems, which combine information retrieval with powerful language models, without necessarily making the language model itself bigger and more expensive.
What's the problem?
RAG systems are great, but making them better usually means using larger and more complex language models. These bigger models are costly to run and harder to deploy in real-world applications. The core issue is finding a way to improve RAG performance *without* constantly increasing the size and expense of the language model.
What's the solution?
The researchers explored whether increasing the amount of information the RAG system searches through – essentially, making the 'corpus' of documents larger – could be a substitute for using a larger language model. They found that, generally, expanding the corpus consistently improved performance and could often achieve similar results to using a bigger model. They also analyzed *why* this worked, discovering it was mainly because a larger corpus meant a higher chance of finding the passages containing the correct answers.
Why it matters?
This research is important because it offers a practical way to build better RAG systems. Instead of always focusing on developing ever-larger language models, which are expensive and resource-intensive, developers can invest in creating larger and more comprehensive collections of documents. This provides a cost-effective alternative to improve RAG performance, especially for mid-sized language models, and establishes a clear trade-off between the size of the corpus and the size of the language model.
Abstract
Retrieval-Augmented Generation (RAG) couples document retrieval with large language models (LLMs). While scaling generators improves accuracy, it also raises cost and limits deployability. We explore an orthogonal axis: enlarging the retriever's corpus to reduce reliance on large LLMs. Experimental results show that corpus scaling consistently strengthens RAG and can often serve as a substitute for increasing model size, though with diminishing returns at larger scales. Small- and mid-sized generators paired with larger corpora often rival much larger models with smaller corpora; mid-sized models tend to gain the most, while tiny and large models benefit less. Our analysis shows that improvements arise primarily from increased coverage of answer-bearing passages, while utilization efficiency remains largely unchanged. These findings establish a principled corpus-generator trade-off: investing in larger corpora offers an effective path to stronger RAG, often comparable to enlarging the LLM itself.