SPIDER: A Comprehensive Multi-Organ Supervised Pathology Dataset and Baseline Models

Dmitry Nechaev, Alexey Pchelnikov, Ekaterina Ivanova

2025-03-05

SPIDER: A Comprehensive Multi-Organ Supervised Pathology Dataset and
Baseline Models

Summary

This paper talks about SPIDER, a new large dataset of medical images from different organs that helps train AI to understand and analyze tissue samples better

What's the problem?

Current datasets for teaching AI about medical images are limited. They often focus on just one organ, don't cover many types of tissues, or have low-quality labels. This makes it hard for AI to learn about different organs and diseases

What's the solution?

The researchers created SPIDER, which includes high-quality images of tissues from skin, colorectal, and thorax organs. These images are carefully labeled by expert doctors and include surrounding areas for context. They also made AI models using this data that can identify different types of tissues really well

Why it matters?

This matters because it can help doctors diagnose diseases faster and more accurately. The dataset and AI models are free for anyone to use, which means researchers worldwide can work on improving medical AI. This could lead to better healthcare, quicker diagnoses, and new ways to study diseases across different organs

Abstract

Advancing AI in computational pathology requires large, high-quality, and diverse datasets, yet existing public datasets are often limited in organ diversity, class coverage, or annotation quality. To bridge this gap, we introduce SPIDER (Supervised Pathology Image-DEscription Repository), the largest publicly available patch-level dataset covering multiple organ types, including Skin, Colorectal, and Thorax, with comprehensive class coverage for each organ. SPIDER provides high-quality annotations verified by expert pathologists and includes surrounding context patches, which enhance classification performance by providing spatial context. Alongside the dataset, we present baseline models trained on SPIDER using the Hibou-L foundation model as a feature extractor combined with an attention-based classification head. The models achieve state-of-the-art performance across multiple tissue categories and serve as strong benchmarks for future digital pathology research. Beyond patch classification, the model enables rapid identification of significant areas, quantitative tissue metrics, and establishes a foundation for multimodal approaches. Both the dataset and trained models are publicly available to advance research, reproducibility, and AI-driven pathology development. Access them at: https://github.com/HistAI/SPIDER

View Paper