Pathways on the Image Manifold: Image Editing via Video Generation

Noam Rotstein, Gal Yona, Daniel Silver, Roy Velich, David Bensaïd, Ron Kimmel

2024-11-27

Pathways on the Image Manifold: Image Editing via Video Generation

Summary

This paper presents a new method for editing images by treating the process as a video generation task, allowing for smoother and more accurate transitions between the original image and the edited version.

What's the problem?

Current image editing techniques often struggle to follow complex instructions and can unintentionally change important details of the original image. This makes it difficult to create high-quality edits that maintain the essence of the original picture.

What's the solution?

The authors propose a method that uses pretrained video models to create smooth transitions from the original image to the edited version. By treating image editing as a temporal process, they ensure that changes occur gradually and logically, rather than all at once. This approach helps maintain the key features of the original image while accurately applying the desired edits.

Why it matters?

This research is important because it improves how we can edit images, making it easier to achieve high-quality results that look natural and coherent. By leveraging video generation techniques, this method can enhance various applications in digital art, filmmaking, and content creation, providing users with better tools for visual storytelling.

Abstract

Recent advances in image editing, driven by image diffusion models, have shown remarkable progress. However, significant challenges remain, as these models often struggle to follow complex edit instructions accurately and frequently compromise fidelity by altering key elements of the original image. Simultaneously, video generation has made remarkable strides, with models that effectively function as consistent and continuous world simulators. In this paper, we propose merging these two fields by utilizing image-to-video models for image editing. We reformulate image editing as a temporal process, using pretrained video models to create smooth transitions from the original image to the desired edit. This approach traverses the image manifold continuously, ensuring consistent edits while preserving the original image's key aspects. Our approach achieves state-of-the-art results on text-based image editing, demonstrating significant improvements in both edit accuracy and image preservation.

View Paper