Question Answering on Patient Medical Records with Private Fine-Tuned LLMs

Sara Kothari, Ayush Gupta

2025-01-27

Question Answering on Patient Medical Records with Private Fine-Tuned LLMs

Summary

This paper talks about using special AI models called Large Language Models (LLMs) to help people understand and use their medical records more easily. The researchers found a way to make these AI models work well while keeping patients' information private and secure.

What's the problem?

Hospitals and doctors create a lot of digital medical records, but these records are often hard for patients to understand or use. There's a ton of information in there, but it's complicated and there's so much of it that it's tough to find what you need. Also, using regular AI to help with this could put patients' private information at risk.

What's the solution?

The researchers came up with a two-step process to solve this problem. First, they taught the AI to find the most important parts of a medical record based on what someone is asking about. Then, they taught it to answer questions using just those important parts. They used smaller, specially trained AI models that can be kept private and secure, instead of big public ones. These smaller AIs actually did better than the big, famous ones like GPT-4, even though they're much smaller.

Why it matters?

This matters because it could make it much easier for people to understand their own health information without risking their privacy. Imagine being able to ask your phone questions about your medical records and getting clear, accurate answers without worrying about your personal information being shared. This could help people take better care of their health, understand their doctor's instructions better, and feel more in control of their medical care. It's also important for hospitals and doctors, who need to keep patient information private while still using modern technology to provide better care.

Abstract

Healthcare systems continuously generate vast amounts of electronic health records (EHRs), commonly stored in the Fast Healthcare Interoperability Resources (FHIR) standard. Despite the wealth of information in these records, their complexity and volume make it difficult for users to retrieve and interpret crucial health insights. Recent advances in Large Language Models (LLMs) offer a solution, enabling semantic question answering (QA) over medical data, allowing users to interact with their health records more effectively. However, ensuring privacy and compliance requires edge and private deployments of LLMs. This paper proposes a novel approach to semantic QA over EHRs by first identifying the most relevant FHIR resources for a user query (Task1) and subsequently answering the query based on these resources (Task2). We explore the performance of privately hosted, fine-tuned LLMs, evaluating them against benchmark models such as GPT-4 and GPT-4o. Our results demonstrate that fine-tuned LLMs, while 250x smaller in size, outperform GPT-4 family models by 0.55% in F1 score on Task1 and 42% on Meteor Task in Task2. Additionally, we examine advanced aspects of LLM usage, including sequential fine-tuning, model self-evaluation (narcissistic evaluation), and the impact of training data size on performance. The models and datasets are available here: https://huggingface.co/genloop

View Paper