SAGE: A Framework of Precise Retrieval for RAG

Jintao Zhang, Guoliang Li, Jinyang Su

2025-03-10

SAGE: A Framework of Precise Retrieval for RAG

Summary

This paper talks about SAGE, a new system that improves how AI models find and use information to answer questions by making the retrieval process more accurate and efficient

What's the problem?

Current methods for Retrieval-Augmented Generation (RAG) struggle to find the right information because they split text into chunks without considering meaning, and they often retrieve either too little or too much context, which leads to errors in answering questions

What's the solution?

The researchers created SAGE, which uses three main techniques: semantic segmentation to divide text into meaningful chunks, a dynamic chunk selection algorithm to pick only the most relevant pieces of information, and a self-feedback system where the AI adjusts the amount of context it retrieves based on its needs. These improvements help the AI avoid irrelevant or incomplete information while lowering costs

Why it matters?

This matters because SAGE makes AI systems better at answering questions accurately while using fewer resources. It can be applied to many areas where finding precise information quickly is important, like research, customer service, or education. By improving RAG systems, SAGE helps make AI tools more reliable and efficient

Abstract

Retrieval-augmented generation (RAG) has demonstrated significant proficiency in conducting question-answering (QA) tasks within a specified corpus. Nonetheless, numerous failure instances of RAG in QA still exist. These failures are not solely attributable to the limitations of Large Language Models (LLMs); instead, they predominantly arise from the retrieval of inaccurate information for LLMs due to two limitations: (1) Current RAG methods segment the corpus without considering semantics, making it difficult to find relevant context due to impaired correlation between questions and the segments. (2) There is a trade-off between missing essential context with fewer context retrieved and getting irrelevant context with more context retrieved. In this paper, we introduce a RAG framework (SAGE), to overcome these limitations. First, to address the segmentation issue without considering semantics, we propose to train a semantic segmentation model. This model is trained to segment the corpus into semantically complete chunks. Second, to ensure that only the most relevant chunks are retrieved while the irrelevant ones are ignored, we design a chunk selection algorithm to dynamically select chunks based on the decreasing speed of the relevance score, leading to a more relevant selection. Third, to further ensure the precision of the retrieved chunks, we propose letting LLMs assess whether retrieved chunks are excessive or lacking and then adjust the amount of context accordingly. Experiments show that SAGE outperforms baselines by 61.25% in the quality of QA on average. Moreover, by avoiding retrieving noisy context, SAGE lowers the cost of the tokens consumed in LLM inference and achieves a 49.41% enhancement in cost efficiency on average. Additionally, our work offers valuable insights for boosting RAG.

View Paper