MedGen: Unlocking Medical Video Generation by Scaling Granularly-annotated Medical Videos

Rongsheng Wang, Junying Chen, Ke Ji, Zhenyang Cai, Shunian Chen, Yunjin Yang, Benyou Wang

2025-07-09

MedGen: Unlocking Medical Video Generation by Scaling
Granularly-annotated Medical Videos

Summary

This paper talks about MedGen, a new AI model trained on a large dataset called MedVideoCap-55K that consists of over 55,000 carefully selected medical video clips paired with detailed captions. MedGen can generate high-quality medical videos that look realistic and are medically accurate.

What's the problem?

The problem is that existing video generation models often create videos that are visually blurry or medically incorrect when applied to medical topics. This happens mainly because there weren't large, high-quality medical video datasets available for training specialized models.

What's the solution?

The researchers built MedVideoCap-55K with diverse, carefully filtered medical videos and paired captions to train MedGen. This training helped MedGen balance both visual quality and medical accuracy better than other open-source models, making it safer and more reliable for medical education, simulation, and related tasks.

Why it matters?

This matters because generating accurate medical videos can improve clinical training, help doctors and patients understand medical procedures better, and support advances in healthcare through realistic simulations and educational tools.

Abstract

MedGen, a model trained on MedVideoCap-55K, achieves top performance in medical video generation, balancing visual quality and medical accuracy.

View Paper