Current Pathology Foundation Models are unrobust to Medical Center Differences

Edwin D. de Jong, Eric Marcus, Jonas Teuwen

2025-02-04

Current Pathology Foundation Models are unrobust to Medical Center
Differences

Summary

This paper talks about how current AI models used in pathology, called foundation models (FMs), are not reliable when applied to data from different medical centers. The researchers introduce a new way to measure how well these models focus on biological features like cancer type instead of being influenced by differences in medical center procedures.

What's the problem?

Pathology foundation models often perform poorly when analyzing data from different medical centers because they are affected by variations like staining techniques or equipment used. Instead of focusing on important biological features, these models are overly influenced by the medical center where the data comes from. This makes them unreliable for clinical use.

What's the solution?

The researchers created a new metric called the Robustness Index to measure how much these models focus on biological features versus medical center-specific factors. They tested ten publicly available pathology foundation models and found that most were more influenced by medical center differences than by biological features. They also analyzed how this lack of robustness affects the accuracy of cancer classification and visualized how the models organize data, showing that they prioritize medical center information over biological data.

Why it matters?

This research is important because it highlights a major limitation in current pathology AI models, which need to be reliable across different medical centers to be useful in real-world healthcare. By introducing the Robustness Index, this study provides a tool to improve these models, helping pave the way for more accurate and trustworthy AI in diagnosing and treating diseases.

Abstract

Pathology Foundation Models (FMs) hold great promise for healthcare. Before they can be used in clinical practice, it is essential to ensure they are robust to variations between medical centers. We measure whether pathology FMs focus on biological features like tissue and cancer type, or on the well known confounding medical center signatures introduced by staining procedure and other differences. We introduce the Robustness Index. This novel robustness metric reflects to what degree biological features dominate confounding features. Ten current publicly available pathology FMs are evaluated. We find that all current pathology foundation models evaluated represent the medical center to a strong degree. Significant differences in the robustness index are observed. Only one model so far has a robustness index greater than one, meaning biological features dominate confounding features, but only slightly. A quantitative approach to measure the influence of medical center differences on FM-based prediction performance is described. We analyze the impact of unrobustness on classification performance of downstream models, and find that cancer-type classification errors are not random, but specifically attributable to same-center confounders: images of other classes from the same medical center. We visualize FM embedding spaces, and find these are more strongly organized by medical centers than by biological factors. As a consequence, the medical center of origin is predicted more accurately than the tissue source and cancer type. The robustness index introduced here is provided with the aim of advancing progress towards clinical adoption of robust and reliable pathology FMs.

View Paper