Named Clinical Entity Recognition Benchmark

Wadood M Abdul, Marco AF Pimentel, Muhammad Umar Salman, Tathagata Raha, Clément Christophe, Praveen K Kanithi, Nasir Hayat, Ronnie Rajan, Shadab Khan

2024-10-08

Summary

This paper introduces the Named Clinical Entity Recognition Benchmark, a new framework for evaluating how well language models can identify and classify important medical terms from clinical texts.

What's the problem?

In healthcare, it's essential to extract structured information from unstructured clinical narratives, such as patient notes and medical records. However, existing methods for assessing language models in this area are not standardized, making it difficult to compare their performance. This lack of a clear evaluation framework can lead to uncertainty about how well these models actually work in real-world medical applications.

What's the solution?

To solve this problem, the authors created a benchmark that includes a variety of tasks focused on recognizing clinical entities like diseases, symptoms, medications, and procedures. They used a collection of publicly available clinical datasets and ensured that the entities were standardized according to established medical data models. The benchmark evaluates models based on their ability to accurately identify these entities using metrics like the F1-score, which measures both precision and recall. This allows for a more reliable assessment of different language models in healthcare settings.

Why it matters?

This research is important because it establishes a rigorous standard for evaluating language models in the healthcare field. By providing a clear benchmark, it promotes transparency and allows researchers and developers to compare different models effectively. This can lead to improvements in how medical information is processed and understood by AI systems, ultimately enhancing patient care and supporting clinical decision-making.

Abstract

This technical report introduces a Named Clinical Entity Recognition Benchmark for evaluating language models in healthcare, addressing the crucial natural language processing (NLP) task of extracting structured information from clinical narratives to support applications like automated coding, clinical trial cohort identification, and clinical decision support. The leaderboard provides a standardized platform for assessing diverse language models, including encoder and decoder architectures, on their ability to identify and classify clinical entities across multiple medical domains. A curated collection of openly available clinical datasets is utilized, encompassing entities such as diseases, symptoms, medications, procedures, and laboratory measurements. Importantly, these entities are standardized according to the Observational Medical Outcomes Partnership (OMOP) Common Data Model, ensuring consistency and interoperability across different healthcare systems and datasets, and a comprehensive evaluation of model performance. Performance of models is primarily assessed using the F1-score, and it is complemented by various assessment modes to provide comprehensive insights into model performance. The report also includes a brief analysis of models evaluated to date, highlighting observed trends and limitations. By establishing this benchmarking framework, the leaderboard aims to promote transparency, facilitate comparative analyses, and drive innovation in clinical entity recognition tasks, addressing the need for robust evaluation methods in healthcare NLP.

View Paper