Rare Disease Differential Diagnosis with Large Language Models at Scale: From Abdominal Actinomycosis to Wilson's Disease

Elliot Schumacher, Dhruv Naik, Anitha Kannan

2025-02-24

Rare Disease Differential Diagnosis with Large Language Models at Scale:
From Abdominal Actinomycosis to Wilson's Disease

Summary

This paper talks about RareScale, a new system that combines AI language models with expert medical knowledge to help doctors diagnose rare diseases more accurately.

What's the problem?

Rare diseases are hard to diagnose because they don't happen often, and doctors might not have much experience with them. While AI language models are good at diagnosing common diseases, they struggle with rare ones. This is a big issue because missing a rare disease diagnosis can be dangerous for patients.

What's the solution?

The researchers created RareScale, which uses both AI and expert medical systems to create practice conversations about rare diseases. They used this to train a special AI that can suggest possible rare diseases. Then, they combined this with a more general AI to make final diagnoses. They tested this system on over 575 rare diseases and found it was much better at diagnosing them than regular AI alone.

Why it matters?

This matters because it could help doctors catch rare diseases earlier and more often. By combining AI with expert knowledge, RareScale makes it easier for doctors to consider rare diseases without ignoring common ones. This could lead to faster, more accurate diagnoses for patients with rare conditions, potentially saving lives and reducing the time and stress of searching for the right diagnosis.

Abstract

Large language models (LLMs) have demonstrated impressive capabilities in disease diagnosis. However, their effectiveness in identifying rarer diseases, which are inherently more challenging to diagnose, remains an open question. Rare disease performance is critical with the increasing use of LLMs in healthcare settings. This is especially true if a primary care physician needs to make a rarer prognosis from only a patient conversation so that they can take the appropriate next step. To that end, several clinical decision support systems are designed to support providers in rare disease identification. Yet their utility is limited due to their lack of knowledge of common disorders and difficulty of use. In this paper, we propose RareScale to combine the knowledge LLMs with expert systems. We use jointly use an expert system and LLM to simulate rare disease chats. This data is used to train a rare disease candidate predictor model. Candidates from this smaller model are then used as additional inputs to black-box LLM to make the final differential diagnosis. Thus, RareScale allows for a balance between rare and common diagnoses. We present results on over 575 rare diseases, beginning with Abdominal Actinomycosis and ending with Wilson's Disease. Our approach significantly improves the baseline performance of black-box LLMs by over 17% in Top-5 accuracy. We also find that our candidate generation performance is high (e.g. 88.8% on gpt-4o generated chats).

View Paper