How do you know that? Teaching Generative Language Models to Reference Answers to Biomedical Questions
Bojana Bašaragin, Adela Ljajić, Darija Medvecki, Lorenzo Cassano, Miloš Košprdić, Nikola Milošević
2024-07-10

Summary
This paper talks about a new system that improves how large language models (LLMs) provide answers to biomedical questions. It focuses on making these answers more accurate and reliable by using a method called retrieval-augmented generation (RAG).
What's the problem?
The main problem is that while LLMs can generate fluent and informative responses, they often struggle with accuracy, especially in sensitive areas like biomedicine. This can lead to misinformation, which is particularly concerning when users rely on these answers for important health-related decisions.
What's the solution?
To solve this issue, the authors developed a RAG system that uses a fine-tuned LLM specifically designed for answering biomedical questions. This system retrieves relevant research abstracts from PubMed and incorporates them into the LLM's context when generating answers. By referencing these abstracts, the model can provide answers that users can verify against reliable sources. The results show that this approach significantly improves the accuracy of the answers compared to traditional PubMed searches.
Why it matters?
This research is important because it addresses the critical need for accurate information in healthcare and biomedical fields. By enhancing LLMs with a reliable referencing system, users can trust the answers they receive, which can lead to better-informed decisions regarding their health. Additionally, making the dataset and models publicly available promotes further research and development in this area.
Abstract
Large language models (LLMs) have recently become the leading source of answers for users' questions online. Despite their ability to offer eloquent answers, their accuracy and reliability can pose a significant challenge. This is especially true for sensitive domains such as biomedicine, where there is a higher need for factually correct answers. This paper introduces a biomedical retrieval-augmented generation (RAG) system designed to enhance the reliability of generated responses. The system is based on a fine-tuned LLM for the referenced question-answering, where retrieved relevant abstracts from PubMed are passed to LLM's context as input through a prompt. Its output is an answer based on PubMed abstracts, where each statement is referenced accordingly, allowing the users to verify the answer. Our retrieval system achieves an absolute improvement of 23% compared to the PubMed search engine. Based on the manual evaluation on a small sample, our fine-tuned LLM component achieves comparable results to GPT-4 Turbo in referencing relevant abstracts. We make the dataset used to fine-tune the models and the fine-tuned models based on Mistral-7B-instruct-v0.1 and v0.2 publicly available.