U-Bench: A Comprehensive Understanding of U-Net through 100-Variant Benchmarking

Fenghe Tang, Chengqi Dong, Wenxin Ma, Zikang Xu, Heqin Zhu, Zihang Jiang, Rongsheng Wang, Yuhao Wang, Chenxu Wu, Shaohua Kevin Zhou

2025-10-09

U-Bench: A Comprehensive Understanding of U-Net through 100-Variant Benchmarking

Summary

This paper introduces U-Bench, a large-scale evaluation of different versions of U-Net, a popular tool used for identifying structures in medical images. It aims to provide a fair and thorough comparison of these models to help researchers choose the best one for their needs.

What's the problem?

While many variations of U-Net have been created, there hasn't been a good way to compare them all systematically. Previous comparisons often lacked strong statistical proof, didn't consider how well the models worked on different types of medical images, or didn't account for how much computing power each model required. This makes it hard to know which U-Net version is truly the best.

What's the solution?

The researchers created U-Bench, which tested 100 different U-Net versions on 28 different medical imaging datasets, covering 10 different imaging techniques. They looked at how accurate the models were, how consistently they performed, how well they worked on images they hadn't seen before, and how efficiently they used computing resources. They even developed a new score, called the U-Score, that combines accuracy and efficiency. Finally, they created a 'model advisor' to help researchers pick the right U-Net for their specific project and made all their data and code publicly available.

Why it matters?

U-Bench provides a much-needed standard for evaluating U-Net models. It helps researchers avoid wasting time on poorly performing models and guides them towards the best options for their specific medical imaging tasks. By making everything public, the researchers are encouraging further development and improvement of these important tools, ensuring more reliable and efficient medical image analysis in the future.

Abstract

Over the past decade, U-Net has been the dominant architecture in medical image segmentation, leading to the development of thousands of U-shaped variants. Despite its widespread adoption, there is still no comprehensive benchmark to systematically evaluate their performance and utility, largely because of insufficient statistical validation and limited consideration of efficiency and generalization across diverse datasets. To bridge this gap, we present U-Bench, the first large-scale, statistically rigorous benchmark that evaluates 100 U-Net variants across 28 datasets and 10 imaging modalities. Our contributions are threefold: (1) Comprehensive Evaluation: U-Bench evaluates models along three key dimensions: statistical robustness, zero-shot generalization, and computational efficiency. We introduce a novel metric, U-Score, which jointly captures the performance-efficiency trade-off, offering a deployment-oriented perspective on model progress. (2) Systematic Analysis and Model Selection Guidance: We summarize key findings from the large-scale evaluation and systematically analyze the impact of dataset characteristics and architectural paradigms on model performance. Based on these insights, we propose a model advisor agent to guide researchers in selecting the most suitable models for specific datasets and tasks. (3) Public Availability: We provide all code, models, protocols, and weights, enabling the community to reproduce our results and extend the benchmark with future methods. In summary, U-Bench not only exposes gaps in previous evaluations but also establishes a foundation for fair, reproducible, and practically relevant benchmarking in the next decade of U-Net-based segmentation models. The project can be accessed at: https://fenghetan9.github.io/ubench. Code is available at: https://github.com/FengheTan9/U-Bench.

View Paper