Med42-v2: A Suite of Clinical LLMs

Clément Christophe, Praveen K Kanithi, Tathagata Raha, Shadab Khan, Marco AF Pimentel

2024-08-13

Summary

This paper discusses Med42-v2, a set of advanced large language models (LLMs) specifically designed for healthcare, which improve upon general models by providing better responses to medical queries.

What's the problem?

Generic language models often struggle to provide accurate answers in healthcare settings because they are not specifically trained on medical data. This can lead to incorrect or overly cautious responses when dealing with clinical questions, making them less useful for healthcare professionals.

What's the solution?

The authors developed Med42-v2, which is based on the Llama3 architecture and fine-tuned with specialized clinical data. These models have been trained to effectively understand and respond to medical prompts, overcoming the limitations of generic models. They also underwent a multi-stage training process to ensure they can handle clinical queries accurately. Med42-v2 models show improved performance on medical benchmarks compared to previous versions and other well-known models like GPT-4.

Why it matters?

This research is significant because it enhances the capabilities of AI in healthcare, making it a valuable tool for doctors and medical researchers. By providing accurate and relevant information, Med42-v2 can help improve patient care, streamline decision-making processes, and support healthcare professionals in their work. Making these models publicly available encourages further innovation and collaboration in the field of medical AI.

Abstract

Med42-v2 introduces a suite of clinical large language models (LLMs) designed to address the limitations of generic models in healthcare settings. These models are built on Llama3 architecture and fine-tuned using specialized clinical data. They underwent multi-stage preference alignment to effectively respond to natural prompts. While generic models are often preference-aligned to avoid answering clinical queries as a precaution, Med42-v2 is specifically trained to overcome this limitation, enabling its use in clinical settings. Med42-v2 models demonstrate superior performance compared to the original Llama3 models in both 8B and 70B parameter configurations and GPT-4 across various medical benchmarks. These LLMs are developed to understand clinical queries, perform reasoning tasks, and provide valuable assistance in clinical environments. The models are now publicly available at https://huggingface.co/m42-health{https://huggingface.co/m42-health}.

View Paper