Spectrum Projection Score: Aligning Retrieved Summaries with Reader Models in Retrieval-Augmented Generation

Zhanghao Hu, Qinglin Zhu, Siya Qi, Yulan He, Hanqi Yan, Lin Gui

2025-08-12

Spectrum Projection Score: Aligning Retrieved Summaries with Reader
Models in Retrieval-Augmented Generation

Summary

This paper talks about Spectrum Projection Score (SPS), a new way to measure how well summaries retrieved to help large language models (LLMs) match what the model understands internally. It introduces a simple, training-free method to check if the information retrieved fits well with what the AI is ready to use, improving how AI combines retrieved knowledge with generating answers.

What's the problem?

The problem is that current retrieval-augmented generation (RAG) systems, which help AI generate answers by pulling in external information, are usually evaluated as a whole, making it hard to tell how much the retrieval part really helps. Additionally, the models that read this information are very sensitive to the exact way instructions are given, so it’s tough to measure how relevant and useful the retrieved summaries are for the AI reader separately from the retriever.

What's the solution?

The paper proposes Spectrum Projection Score, which lets the AI reader check how semantically aligned a summary is by comparing a special representation of the summary’s words with the main directions of the reader’s internal understanding space. This gives a relevance score without needing extra training. On top of this, they present xCompress, a tool that chooses and compresses the best summaries to feed the AI, improving efficiency while keeping quality high. They tested this approach on multiple benchmarks and open-source models, showing it works well across different tasks.

Why it matters?

This matters because helping AI better decide which retrieved information fits its internal knowledge can make it generate more accurate and relevant answers. By clearly measuring and controlling how retrieval and generation work together, this method can improve the reliability and performance of AI systems that depend on searching large amounts of information, benefiting applications like question answering and research.

Abstract

Large Language Models (LLMs) have shown improved generation performance through retrieval-augmented generation (RAG) following the retriever-reader paradigm, which supplements model inputs with externally retrieved knowledge. However, prior work often evaluates RAG holistically, assessing the retriever and reader jointly, making it difficult to isolate the true contribution of retrieval, particularly given the prompt sensitivity of LLMs used as readers. We introduce Spectrum Projection Score (SPS), a lightweight, supervision-free metric that allows the reader to gauge the semantic alignment of a retrieved summary with its hidden representation by comparing the area formed by generated tokens from the summary, and the principal directions of subspace in the reader and to measure the relevance. Building on SPS we present xCompress, an inference time controller framework that dynamically samples, ranks, and compresses retrieval summary candidates. Extensive experiments on five QA benchmarks with four open source LLMs show that SPS not only enhances performance across a range of tasks but also provides a principled perspective on the interaction between retrieval and generation.

View Paper