A Decentralized Retrieval Augmented Generation System with Source Reliabilities Secured on Blockchain
Yining Lu, Wenyi Tang, Max Johnson, Taeho Jung, Meng Jiang
2025-11-18
Summary
This paper introduces a new way to build retrieval-augmented generation (RAG) systems, which combine information retrieval with large language models. Instead of relying on a single, central database, it proposes a decentralized system where information comes directly from many different data owners.
What's the problem?
Traditional RAG systems collect all their information in one place, which is expensive and raises privacy issues. A big challenge with getting information from many different sources is that some sources are more trustworthy or accurate than others. If you just pull from all sources equally, unreliable information can hurt the quality of the answers the system gives.
What's the solution?
The researchers created a system that automatically scores the reliability of each data source. It does this by checking how well the information from each source contributes to good answers. Sources that consistently help generate accurate responses get a higher score and are used more often when retrieving information. To make sure this scoring is fair and can’t be tampered with, they use blockchain technology, which creates a secure and transparent record of each source’s reliability.
Why it matters?
This research is important because it allows for more secure and cost-effective RAG systems. By letting data owners keep control of their information and using a reliable scoring system, the new approach performs almost as well as traditional systems, but with significant savings and improved privacy. It opens the door to using a wider range of information sources without sacrificing quality.
Abstract
Existing retrieval-augmented generation (RAG) systems typically use a centralized architecture, causing a high cost of data collection, integration, and management, as well as privacy concerns. There is a great need for a decentralized RAG system that enables foundation models to utilize information directly from data owners who maintain full control over their sources. However, decentralization brings a challenge: the numerous independent data sources vary significantly in reliability, which can diminish retrieval accuracy and response quality. To address this, our decentralized RAG system has a novel reliability scoring mechanism that dynamically evaluates each source based on the quality of responses it contributes to generate and prioritizes high-quality sources during retrieval. To ensure transparency and trust, the scoring process is securely managed through blockchain-based smart contracts, creating verifiable and tamper-proof reliability records without relying on a central authority. We evaluate our decentralized system with two Llama models (3B and 8B) in two simulated environments where six data sources have different levels of reliability. Our system achieves a +10.7\% performance improvement over its centralized counterpart in the real world-like unreliable data environments. Notably, it approaches the upper-bound performance of centralized systems under ideally reliable data environments. The decentralized infrastructure enables secure and trustworthy scoring management, achieving approximately 56\% marginal cost savings through batched update operations. Our code and system are open-sourced at github.com/yining610/Reliable-dRAG.