MatAnyone: Stable Video Matting with Consistent Memory Propagation
Peiqing Yang, Shangchen Zhou, Jixin Zhao, Qingyi Tao, Chen Change Loy
2025-02-03

Summary
This paper talks about MatAnyone, a new way to make videos where you can easily separate people or objects from their backgrounds, even when the backgrounds are complicated or confusing. It's like having a super smart digital scissors that can cut out moving objects from videos smoothly and accurately.
What's the problem?
Current methods for cutting out people or objects from videos without extra help often mess up when the backgrounds are complex or unclear. It's like trying to cut around a person wearing camouflage in a forest - it's hard to tell where the person ends and the background begins.
What's the solution?
The researchers created MatAnyone, which uses a clever memory system to remember what it's seen in previous video frames. This helps it keep track of the object it's cutting out, even when things get confusing. They also made a big, diverse set of practice videos to train their system and came up with a smart way to use existing data about object outlines to make their system even better at understanding what it's looking at.
Why it matters?
This matters because it could make it much easier to create special effects in movies, make better video editing tools, or even help robots understand what they're seeing in the real world. Imagine being able to easily replace the background in your video calls or create cool TikTok videos where you seamlessly blend into different scenes - that's the kind of thing this technology could make possible for everyone, not just professional video editors.
Abstract
Auxiliary-free human video matting methods, which rely solely on input frames, often struggle with complex or ambiguous backgrounds. To address this, we propose MatAnyone, a robust framework tailored for target-assigned video matting. Specifically, building on a memory-based paradigm, we introduce a consistent memory propagation module via region-adaptive memory fusion, which adaptively integrates memory from the previous frame. This ensures semantic stability in core regions while preserving fine-grained details along object boundaries. For robust training, we present a larger, high-quality, and diverse dataset for video matting. Additionally, we incorporate a novel training strategy that efficiently leverages large-scale segmentation data, boosting matting stability. With this new network design, dataset, and training strategy, MatAnyone delivers robust and accurate video matting results in diverse real-world scenarios, outperforming existing methods.