WiseEdit: Benchmarking Cognition- and Creativity-Informed Image Editing
Kaihang Pan, Weile Chen, Haiyi Qiu, Qifan Yu, Wendong Bu, Zehan Wang, Yun Zhu, Juncheng Li, Siliang Tang
2025-12-02
Summary
This paper introduces a new way to test how well image editing programs can actually 'think' and be creative, going beyond simple edits to see if they understand what they're doing.
What's the problem?
Current tests for image editing models are too simple and don't really challenge them to use knowledge or creativity. They focus on basic changes instead of evaluating if the program understands the content of the image or can make complex, thoughtful edits like a human would.
What's the solution?
The researchers created a benchmark called WiseEdit, which breaks down image editing into three stages: first, understanding *what* is in the image (Awareness), then interpreting *what it means* (Interpretation), and finally, imagining *how to change it* creatively (Imagination). WiseEdit includes 1,220 different tests that require different kinds of knowledge – facts, how-to instructions, and even understanding *how* to think creatively – to see how well the programs perform at each step and in combination.
Why it matters?
This is important because it provides a much more realistic and challenging way to evaluate image editing models. By revealing where these programs struggle with knowledge and creative reasoning, it helps researchers build better, more intelligent image editing tools that can truly assist human creativity.
Abstract
Recent image editing models boast next-level intelligent capabilities, facilitating cognition- and creativity-informed image editing. Yet, existing benchmarks provide too narrow a scope for evaluation, failing to holistically assess these advanced abilities. To address this, we introduce WiseEdit, a knowledge-intensive benchmark for comprehensive evaluation of cognition- and creativity-informed image editing, featuring deep task depth and broad knowledge breadth. Drawing an analogy to human cognitive creation, WiseEdit decomposes image editing into three cascaded steps, i.e., Awareness, Interpretation, and Imagination, each corresponding to a task that poses a challenge for models to complete at the specific step. It also encompasses complex tasks, where none of the three steps can be finished easily. Furthermore, WiseEdit incorporates three fundamental types of knowledge: Declarative, Procedural, and Metacognitive knowledge. Ultimately, WiseEdit comprises 1,220 test cases, objectively revealing the limitations of SoTA image editing models in knowledge-based cognitive reasoning and creative composition capabilities. The benchmark, evaluation code, and the generated images of each model will be made publicly available soon. Project Page: https://qnancy.github.io/wiseedit_project_page/.