SafeRAG: Benchmarking Security in Retrieval-Augmented Generation of Large Language Model

Xun Liang, Simin Niu, Zhiyu Li, Sensen Zhang, Hanyu Wang, Feiyu Xiong, Jason Zhaoxin Fan, Bo Tang, Shichao Song, Mengwei Wang, Jiawei Yang

2025-02-04

SafeRAG: Benchmarking Security in Retrieval-Augmented Generation of
Large Language Model

Summary

This paper talks about SafeRAG, a tool designed to test how secure Retrieval-Augmented Generation (RAG) systems are. RAG systems combine external knowledge with large language models to improve their answers, but this connection can make them vulnerable to attacks. SafeRAG creates a way to evaluate and understand these security risks.

What's the problem?

RAG systems, which use external knowledge to make AI models smarter, can be easily attacked because they rely on unverified or manipulated data. This makes them vulnerable to issues like fake information being added, conflicting data causing confusion, or even denial-of-service attacks that disrupt their functionality. These vulnerabilities can lead to poor performance and unreliable results, but there hasn’t been a good way to measure or understand these risks until now.

What's the solution?

The researchers developed SafeRAG, a benchmark tool that evaluates the security of RAG systems by simulating different types of attacks. They categorized these attacks into groups like adding fake noise, creating conflicts in data, or overloading the system with malicious inputs. Using a specially created dataset, SafeRAG tests how well RAG systems handle these scenarios and identifies their weaknesses. Experiments on various RAG components showed that current systems are highly vulnerable to all types of attacks, even the simplest ones.

Why it matters?

This research is important because it highlights the security risks of using RAG systems and provides a way to test and improve them. As AI becomes more integrated into everyday applications like search engines or customer support, ensuring that these systems are secure and reliable is crucial. SafeRAG helps developers understand and fix vulnerabilities, making AI systems safer and more trustworthy for everyone.

Abstract

The indexing-retrieval-generation paradigm of retrieval-augmented generation (RAG) has been highly successful in solving knowledge-intensive tasks by integrating external knowledge into large language models (LLMs). However, the incorporation of external and unverified knowledge increases the vulnerability of LLMs because attackers can perform attack tasks by manipulating knowledge. In this paper, we introduce a benchmark named SafeRAG designed to evaluate the RAG security. First, we classify attack tasks into silver noise, inter-context conflict, soft ad, and white Denial-of-Service. Next, we construct RAG security evaluation dataset (i.e., SafeRAG dataset) primarily manually for each task. We then utilize the SafeRAG dataset to simulate various attack scenarios that RAG may encounter. Experiments conducted on 14 representative RAG components demonstrate that RAG exhibits significant vulnerability to all attack tasks and even the most apparent attack task can easily bypass existing retrievers, filters, or advanced LLMs, resulting in the degradation of RAG service quality. Code is available at: https://github.com/IAAR-Shanghai/SafeRAG.

View Paper