Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse
Maojia Song, Shang Hong Sim, Rishabh Bhardwaj, Hai Leong Chieu, Navonil Majumder, Soujanya Poria
2024-09-18

Summary
This paper introduces a new way to measure and improve how trustworthy large language models (LLMs) are when used in retrieval-augmented generation (RAG) systems, which combine information retrieval with text generation.
What's the problem?
While LLMs are widely used in RAG systems, there hasn't been enough focus on how suitable they are for these tasks. Existing methods to evaluate LLMs often overlook their ability to refuse inappropriate questions or provide accurate answers based on the retrieved information. This can lead to unreliable outputs, where the model might generate incorrect or misleading information.
What's the solution?
The researchers developed a new metric called Trust-Score to evaluate the trustworthiness of LLMs in RAG systems. This score assesses how well the models can decide when to answer or refuse a question based on the provided documents. They also introduced Trust-Align, a framework that helps align LLMs better for RAG tasks, resulting in improved performance. Their experiments showed that the LLaMA-3 model, when aligned using their method, significantly outperformed other similar models in various benchmarks.
Why it matters?
This research is important because it enhances our understanding of how to make AI models more reliable and trustworthy, especially in applications where accurate information is critical, such as healthcare or legal advice. By improving LLMs' ability to provide grounded and accurate responses, this work can lead to better user experiences and more effective AI systems.
Abstract
LLMs are an integral part of retrieval-augmented generation (RAG) systems. While many studies focus on evaluating the quality of end-to-end RAG systems, there is a lack of research on understanding the appropriateness of an LLM for the RAG task. Thus, we introduce a new metric, Trust-Score, that provides a holistic evaluation of the trustworthiness of LLMs in an RAG framework. We show that various prompting methods, such as in-context learning, fail to adapt LLMs effectively to the RAG task. Thus, we propose Trust-Align, a framework to align LLMs for higher Trust-Score. LLaMA-3-8b, aligned with our method, significantly outperforms open-source LLMs of comparable sizes on ASQA (up 10.7), QAMPARI (up 29.2) and ELI5 (up 14.9). We release our code at: https://github.com/declare-lab/trust-align.