RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation

Xiaoxi Li, Jiajie Jin, Yujia Zhou, Yongkang Wu, Zhonghua Li, Qi Ye, Zhicheng Dou

2024-12-17

RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation

Summary

This paper introduces RetroLLM, a new framework that improves how large language models (LLMs) generate text by allowing them to retrieve specific information more effectively. This helps reduce errors and makes the models smarter in providing accurate answers.

What's the problem?

Large language models are great at generating text but often make mistakes or 'hallucinations'—meaning they can produce incorrect or nonsensical information. Current methods that help these models retrieve information from external sources have limitations, such as needing separate systems for retrieval, wasting space with unnecessary data, and not optimizing the retrieval and generation processes together.

What's the solution?

RetroLLM combines the retrieval of information and the generation of text into one smooth process. It uses a method that allows the model to generate relevant evidence directly from a database while controlling how it retrieves this information. The framework includes two main strategies: hierarchical FM-Index constraints to narrow down relevant documents before generating answers and a forward-looking decoding strategy that considers future text relevance to improve accuracy. This unified approach helps the model work more efficiently and accurately.

Why it matters?

The RetroLLM framework is important because it enhances the ability of language models to provide precise and reliable information, which is crucial for applications like question answering and other tasks where accuracy is vital. By improving how these models retrieve and generate information, it can lead to better performance in real-world applications, making technology more useful and trustworthy.

Abstract

Large language models (LLMs) exhibit remarkable generative capabilities but often suffer from hallucinations. Retrieval-augmented generation (RAG) offers an effective solution by incorporating external knowledge, but existing methods still face several limitations: additional deployment costs of separate retrievers, redundant input tokens from retrieved text chunks, and the lack of joint optimization of retrieval and generation. To address these issues, we propose RetroLLM, a unified framework that integrates retrieval and generation into a single, cohesive process, enabling LLMs to directly generate fine-grained evidence from the corpus with constrained decoding. Moreover, to mitigate false pruning in the process of constrained evidence generation, we introduce (1) hierarchical FM-Index constraints, which generate corpus-constrained clues to identify a subset of relevant documents before evidence generation, reducing irrelevant decoding space; and (2) a forward-looking constrained decoding strategy, which considers the relevance of future sequences to improve evidence accuracy. Extensive experiments on five open-domain QA datasets demonstrate RetroLLM's superior performance across both in-domain and out-of-domain tasks. The code is available at https://github.com/sunnynexus/RetroLLM.

View Paper