Chain-of-Retrieval Augmented Generation

Liang Wang, Haonan Chen, Nan Yang, Xiaolong Huang, Zhicheng Dou, Furu Wei

2025-01-27

Summary

This paper talks about CoRAG (Chain-of-Retrieval Augmented Generation), a new way to make AI systems better at answering complex questions by allowing them to look up information in multiple steps, like a student doing research for a big project.

What's the problem?

Current AI systems that use Retrieval Augmented Generation (RAG) usually only look up information once before answering a question. This is like trying to write an essay after only checking one book in the library. It doesn't work well for complicated questions that need information from multiple sources.

What's the solution?

The researchers created CoRAG, which lets the AI look up information multiple times as it's thinking about the answer. They also came up with a clever way to train the AI using 'rejection sampling,' which is like creating practice questions that force the AI to look things up step-by-step. When actually using CoRAG, they made different strategies to control how much time and effort the AI spends on each question.

Why it matters?

This matters because it could make AI much better at answering complex questions that require piecing together information from different sources. In tests, CoRAG did much better than other systems, especially on questions that needed multiple steps to answer. This could lead to AI assistants that can help with more complicated tasks, like research or problem-solving, in a way that's more similar to how humans think and find information.

Abstract

This paper introduces an approach for training o1-like RAG models that retrieve and reason over relevant information step by step before generating the final answer. Conventional RAG methods usually perform a single retrieval step before the generation process, which limits their effectiveness in addressing complex queries due to imperfect retrieval results. In contrast, our proposed method, CoRAG (Chain-of-Retrieval Augmented Generation), allows the model to dynamically reformulate the query based on the evolving state. To train CoRAG effectively, we utilize rejection sampling to automatically generate intermediate retrieval chains, thereby augmenting existing RAG datasets that only provide the correct final answer. At test time, we propose various decoding strategies to scale the model's test-time compute by controlling the length and number of sampled retrieval chains. Experimental results across multiple benchmarks validate the efficacy of CoRAG, particularly in multi-hop question answering tasks, where we observe more than 10 points improvement in EM score compared to strong baselines. On the KILT benchmark, CoRAG establishes a new state-of-the-art performance across a diverse range of knowledge-intensive tasks. Furthermore, we offer comprehensive analyses to understand the scaling behavior of CoRAG, laying the groundwork for future research aimed at developing factual and grounded foundation models.

View Paper