MoViE: Mobile Diffusion for Video Editing
Adil Karjauv, Noor Fathima, Ioannis Lelekas, Fatih Porikli, Amir Ghodrati, Amirhossein Habibian
2024-12-11

Summary
This paper talks about MoViE, a new mobile-friendly video editing model that allows users to edit videos quickly and efficiently on their smartphones.
What's the problem?
While video editing technology has advanced, most models are too complex and require a lot of computing power, making them difficult to use on mobile devices. This limits the ability of users to edit videos on the go, which is increasingly important in today's mobile-centric world.
What's the solution?
The authors developed MoViE, which optimizes existing video editing techniques to work well on mobile devices. They made several improvements, including simplifying the model's architecture and using a lightweight autoencoder. By reducing the number of steps needed to edit a video and speeding up the process, MoViE can edit videos at a rate of 12 frames per second while maintaining good quality.
Why it matters?
This research is important because it makes advanced video editing accessible to more people by allowing it to run smoothly on smartphones. With MoViE, users can easily create and modify videos anytime and anywhere, enhancing creativity and communication in social media, personal projects, and professional content creation.
Abstract
Recent progress in diffusion-based video editing has shown remarkable potential for practical applications. However, these methods remain prohibitively expensive and challenging to deploy on mobile devices. In this study, we introduce a series of optimizations that render mobile video editing feasible. Building upon the existing image editing model, we first optimize its architecture and incorporate a lightweight autoencoder. Subsequently, we extend classifier-free guidance distillation to multiple modalities, resulting in a threefold on-device speedup. Finally, we reduce the number of sampling steps to one by introducing a novel adversarial distillation scheme which preserves the controllability of the editing process. Collectively, these optimizations enable video editing at 12 frames per second on mobile devices, while maintaining high quality. Our results are available at https://qualcomm-ai-research.github.io/mobile-video-editing/