Key Features

Converts coarse segmentation masks into pixel-accurate alpha mattes using pretrained video diffusion priors.
Achieves strong zero-shot generalization to real-world videos trained only on synthetic data.
Develops a scalable pseudo-labeling pipeline for automatic high-quality video matting annotations.
Introduces the MA-V dataset with over 50K real-world videos featuring diverse scenes and motions.
Fine-tunes SAM2 into SAM2-Matte, outperforming baselines on in-the-wild video matting tasks.
Ensures temporal consistency and detail preservation through mask-guided diffusion refinement.
Supports interactive demos for video selection and in-the-wild result visualization.
Provides quantitative and qualitative comparisons validating superior performance metrics.

Building upon VideoMaMa's strong capabilities, the framework introduces a scalable pseudo-labeling pipeline that generates high-quality matting annotations automatically from accessible segmentation cues. This pipeline facilitates the creation of the Matting Anything in Video (MA-V) dataset, comprising over 50,000 real-world videos annotated with pixel-accurate alpha mattes, covering a broad spectrum of everyday scenarios, dynamic movements, and environmental variations. By democratizing access to large-scale training data, VideoMaMa paves the way for advancing video editing tools, compositing workflows, and augmented reality applications that demand seamless foreground-background separation.


To demonstrate practical impact, VideoMaMa fine-tunes the Segment Anything Model 2 (SAM2) on the MA-V dataset, resulting in SAM2-Matte, which exhibits superior robustness and accuracy on unseen in-the-wild videos compared to models trained on prior matting datasets. The architecture integrates mask-guided processing with diffusion-based refinement, ensuring temporal consistency and fine-grained detail preservation across video frames. All models, code, and the comprehensive MA-V dataset are set for public release, empowering researchers and developers to push the boundaries of generative video processing and scalable annotation strategies.

Get more likes & reach the top of search results by adding this button on your site!

Embed button preview - Light theme
Embed button preview - Dark theme
TurboType Banner
Zero to AI Engineer Program

Zero to AI Engineer

Skip the degree. Learn real-world AI skills used by AI researchers and engineers. Get certified in 8 weeks or less. No experience required.

Subscribe to the AI Search Newsletter

Get top updates in AI to your inbox every weekend. It's free!