Toward General Instruction-Following Alignment for Retrieval-Augmented Generation
Guanting Dong, Xiaoshuai Song, Yutao Zhu, Runqi Qiao, Zhicheng Dou, Ji-Rong Wen
2024-10-15

Summary
This paper discusses VIF-RAG, a new system designed to improve how well large language models (LLMs) follow instructions when generating content, especially in tasks that involve retrieving information.
What's the problem?
As AI technology advances, it's important for models to accurately follow human instructions to produce useful and relevant content. However, current methods for assessing how well these models follow instructions are limited and often require a lot of resources. This makes it difficult to ensure that the models can handle a variety of user requests effectively.
What's the solution?
VIF-RAG introduces an automated and scalable pipeline for improving instruction-following in retrieval-augmented generation (RAG) systems. It starts by creating a small set of basic instructions and then combines them to generate more complex instructions. The system uses supervised models to rewrite these instructions and employs automated tools to verify their quality. Additionally, the authors developed the FollowRAG Benchmark, which includes around 3,000 test samples to evaluate how well LLMs can follow different types of instructions across various knowledge areas.
Why it matters?
This research is significant because it enhances the ability of AI models to understand and respond to complex instructions, making them more effective in real-world applications. By providing a reliable way to evaluate and improve instruction-following capabilities, VIF-RAG can help develop AI systems that are more aligned with human needs and expectations.
Abstract
Following natural instructions is crucial for the effective application of Retrieval-Augmented Generation (RAG) systems. Despite recent advancements in Large Language Models (LLMs), research on assessing and improving instruction-following (IF) alignment within the RAG domain remains limited. To address this issue, we propose VIF-RAG, the first automated, scalable, and verifiable synthetic pipeline for instruction-following alignment in RAG systems. We start by manually crafting a minimal set of atomic instructions (<100) and developing combination rules to synthesize and verify complex instructions for a seed set. We then use supervised models for instruction rewriting while simultaneously generating code to automate the verification of instruction quality via a Python executor. Finally, we integrate these instructions with extensive RAG and general data samples, scaling up to a high-quality VIF-RAG-QA dataset (>100k) through automated processes. To further bridge the gap in instruction-following auto-evaluation for RAG systems, we introduce FollowRAG Benchmark, which includes approximately 3K test samples, covering 22 categories of general instruction constraints and four knowledge-intensive QA datasets. Due to its robust pipeline design, FollowRAG can seamlessly integrate with different RAG benchmarks. Using FollowRAG and eight widely-used IF and foundational abilities benchmarks for LLMs, we demonstrate that VIF-RAG markedly enhances LLM performance across a broad range of general instruction constraints while effectively leveraging its capabilities in RAG scenarios. Further analysis offers practical insights for achieving IF alignment in RAG systems. Our code and datasets are released at https://FollowRAG.github.io.