Kontinuous Kontext: Continuous Strength Control for Instruction-based Image Editing
Rishubh Parihar, Or Patashnik, Daniil Ostashev, R. Venkatesh Babu, Daniel Cohen-Or, Kuan-Chieh Wang
2025-10-15
Summary
This paper introduces a new way to edit images using text instructions, but with the added ability to control *how much* of the edit actually happens.
What's the problem?
Normally, when you tell a computer to edit an image with a text command, it just does the edit. You can't easily say 'make it a little more stylized' or 'change the background, but not too much'. Existing methods lacked a way to smoothly adjust the strength or extent of an image edit based on a text instruction.
What's the solution?
The researchers created a model called Kontinuous Kontext. It takes both a text instruction *and* a number representing the edit strength. This number tells the model how strongly to apply the edit, ranging from no change at all to a full change. They trained a small piece of the model, called a 'projector network', to understand how to use this strength number along with the text instruction to control the editing process. To get enough examples to train the model, they used other AI programs to create a large dataset of images, edits, instructions, and strength values, then checked the quality of the data.
Why it matters?
This is important because it gives users much more control over image editing. Instead of getting just one result from a text prompt, you can now fine-tune the edit to get exactly the look you want, whether it's a subtle change or a dramatic transformation, and it works for many different types of edits like style, attributes, or backgrounds.
Abstract
Instruction-based image editing offers a powerful and intuitive way to manipulate images through natural language. Yet, relying solely on text instructions limits fine-grained control over the extent of edits. We introduce Kontinuous Kontext, an instruction-driven editing model that provides a new dimension of control over edit strength, enabling users to adjust edits gradually from no change to a fully realized result in a smooth and continuous manner. Kontinuous Kontext extends a state-of-the-art image editing model to accept an additional input, a scalar edit strength which is then paired with the edit instruction, enabling explicit control over the extent of the edit. To inject this scalar information, we train a lightweight projector network that maps the input scalar and the edit instruction to coefficients in the model's modulation space. For training our model, we synthesize a diverse dataset of image-edit-instruction-strength quadruplets using existing generative models, followed by a filtering stage to ensure quality and consistency. Kontinuous Kontext provides a unified approach for fine-grained control over edit strength for instruction driven editing from subtle to strong across diverse operations such as stylization, attribute, material, background, and shape changes, without requiring attribute-specific training.