EditCrafter: Tuning-free High-Resolution Image Editing via Pretrained Diffusion Model
Kunho Kim, Sumin Seo, Yongjun Cho, Hyungjin Chung
2026-04-24
Summary
This paper introduces EditCrafter, a new way to edit images at high resolutions using artificial intelligence, specifically a type of AI called diffusion models that were originally designed to create images from text.
What's the problem?
Existing image editing tools using diffusion models struggle with high-resolution images or images that aren't square. They were trained on smaller, standard-sized images, and simply applying the editing process to larger images piece by piece doesn't work well, often resulting in unrealistic or repetitive details. It's hard to make changes to images of any size or shape without retraining the AI, which is time-consuming and requires a lot of data.
What's the solution?
EditCrafter solves this by first breaking down the high-resolution image into tiles, then cleverly manipulating these tiles while preserving the original image's overall look. It also uses a refined technique called noise-damped manifold-constrained classifier-free guidance (NDCFG++) which is specifically designed for editing high-resolution images without needing to retrain or adjust the AI model. This allows for editing at any resolution without losing quality.
Why it matters?
This research is important because it makes powerful image editing technology accessible for a wider range of images and applications. Being able to edit high-resolution images without needing to retrain the AI opens up possibilities for things like professional photo editing, creating detailed artwork, and manipulating images for scientific visualization, all without the usual limitations of existing methods.
Abstract
We propose EditCrafter, a high-resolution image editing method that operates without tuning, leveraging pretrained text-to-image (T2I) diffusion models to process images at resolutions significantly exceeding those used during training. Leveraging the generative priors of large-scale T2I diffusion models enables the development of a wide array of novel generation and editing applications. Although numerous image editing methods have been proposed based on diffusion models and exhibit high-quality editing results, they are difficult to apply to images with arbitrary aspect ratios or higher resolutions since they only work at the training resolutions (512x512 or 1024x1024). Naively applying patch-wise editing fails with unrealistic object structures and repetition. To address these challenges, we introduce EditCrafter, a simple yet effective editing pipeline. EditCrafter operates by first performing tiled inversion, which preserves the original identity of the input high-resolution image. We further propose a noise-damped manifold-constrained classifier-free guidance (NDCFG++) that is tailored for high resolution image editing from the inverted latent. Our experiments show that the our EditCrafter can achieve impressive editing results across various resolutions without fine-tuning and optimization.