From RAGs to rich parameters: Probing how language models utilize external knowledge over parametric information for factual queries

Hitesh Wadhwa, Rahul Seetharaman, Somyaa Aggarwal, Reshmi Ghosh, Samyadeep Basu, Soundararajan Srinivasan, Wenlong Zhao, Shreyas Chaudhari, Ehsan Aghazadeh

2024-06-19

From RAGs to rich parameters: Probing how language models utilize external knowledge over parametric information for factual queries

Summary

This paper explores how language models use external information to improve their answers when responding to questions. It focuses on a technique called Retrieval Augmented Generation (RAG), which combines the model's internal knowledge with additional context from outside sources.

What's the problem?

While RAG has become popular for enhancing the performance of language models in tasks like search and question answering, it's not entirely clear how it works. One major issue is that these models often rely more on the external context provided during retrieval rather than their own internal knowledge, which can lead to incomplete or biased responses.

What's the solution?

The authors conducted a detailed analysis of the RAG process to understand how language models utilize external knowledge. They used methods like Causal Mediation Analysis to show that these models don't effectively use their internal memory when answering questions. Instead, they found that the models tend to depend heavily on other relevant information available in the context rather than the specific question being asked. This behavior was observed in popular models like LLaMa and Phi, indicating a common issue across different systems.

Why it matters?

This research is important because it sheds light on how language models operate when using external information. By understanding these mechanisms better, researchers can improve the design of RAG systems, making them more reliable and effective for generating accurate responses. This could enhance applications such as chatbots, search engines, and any AI tool that relies on understanding and answering user queries.

Abstract

Retrieval Augmented Generation (RAG) enriches the ability of language models to reason using external context to augment responses for a given user prompt. This approach has risen in popularity due to practical applications in various applications of language models in search, question/answering, and chat-bots. However, the exact nature of how this approach works isn't clearly understood. In this paper, we mechanistically examine the RAG pipeline to highlight that language models take shortcut and have a strong bias towards utilizing only the context information to answer the question, while relying minimally on their parametric memory. We probe this mechanistic behavior in language models with: (i) Causal Mediation Analysis to show that the parametric memory is minimally utilized when answering a question and (ii) Attention Contributions and Knockouts to show that the last token residual stream do not get enriched from the subject token in the question, but gets enriched from other informative tokens in the context. We find this pronounced shortcut behaviour true across both LLaMa and Phi family of models.

View Paper