SmartPhotoCrafter: Unified Reasoning, Generation and Optimization for Automatic Photographic Image Editing

Ying Zeng, Miaosen Luo, Guangyuan Li, Yang Yang, Ruiyang Fan, Linxiao Shi, Qirui Yang, Jian Zhang, Chengcheng Liu, Siming Zheng, Jinwei Chen, Bo Li, Peng-Tao Jiang

2026-04-22

SmartPhotoCrafter: Unified Reasoning, Generation and Optimization for Automatic Photographic Image Editing

Summary

This paper introduces SmartPhotoCrafter, a new system that automatically edits photos to make them look better without needing a user to tell it exactly what to do.

What's the problem?

Usually, when you want to edit a photo, you need to know a lot about photography and tell the editing software specifically what changes to make. This is hard for people who aren't experts, and even experts sometimes struggle to clearly explain what they want a photo to *feel* like. Existing methods rely on very specific instructions, which can be difficult to provide or may not capture the desired aesthetic.

What's the solution?

SmartPhotoCrafter works in two main steps. First, it analyzes the photo and identifies what's wrong with it – things like poor lighting or color issues. This is done by a part of the system called the 'Image Critic'. Then, another part called the 'Photographic Artist' automatically fixes those problems to make the photo more appealing. The system was trained in stages, first learning basic editing skills, then learning to follow more complex instructions, and finally learning to improve both its understanding of the photo and its editing skills at the same time. They also created a special dataset to help the system learn effectively.

Why it matters?

This is important because it means anyone can get professional-looking photo edits without needing to be a photography expert. It makes photo editing more accessible and can help people easily improve their pictures, whether they're fixing a blurry photo or just making colors look more vibrant. The system also performs edits that look realistic and maintain consistent colors and tones.

Abstract

Traditional photographic image editing typically requires users to possess sufficient aesthetic understanding to provide appropriate instructions for adjusting image quality and camera parameters. However, this paradigm relies on explicit human instruction of aesthetic intent, which is often ambiguous, incomplete, or inaccessible to non-expert users. In this work, we propose SmartPhotoCrafter, an automatic photographic image editing method which formulates image editing as a tightly coupled reasoning-to-generation process. The proposed model first performs image quality comprehension and identifies deficiencies by the Image Critic module, and then the Photographic Artist module realizes targeted edits to enhance image appeal, eliminating the need for explicit human instructions. A multi-stage training pipeline is adopted: (i) Foundation pretraining to establish basic aesthetic understanding and editing capabilities, (ii) Adaptation with reasoning-guided multi-edit supervision to incorporate rich semantic guidance, and (iii) Coordinated reasoning-to generation reinforcement learning to jointly optimize reasoning and generation. During training, SmartPhotoCrafter emphasizes photo-realistic image generation, while supporting both image restoration and retouching tasks with consistent adherence to color- and tone-related semantics. We also construct a stage-specific dataset, which progressively builds reasoning and controllable generation, effective cross-module collaboration, and ultimately high-quality photographic enhancement. Experiments demonstrate that SmartPhotoCrafter outperforms existing generative models on the task of automatic photographic enhancement, achieving photo-realistic results while exhibiting higher tonal sensitivity to retouching instructions. Project page: https://github.com/vivoCameraResearch/SmartPhotoCrafter.

View Paper