Loomis Painter: Reconstructing the Painting Process
Markus Pobitzer, Chang Liu, Chenyi Zhuang, Teng Long, Bin Ren, Nicu Sebe
2025-11-24
Summary
This paper introduces a new way to automatically generate painting tutorials, similar to how you might find them on YouTube, but with more control and consistency.
What's the problem?
Currently, learning to paint from online videos is limited because those videos aren't interactive or tailored to your skill level. While computers can now *create* art, they often struggle to mimic the step-by-step process a human artist uses, especially when switching between different painting styles or mediums like watercolor versus oil paint. The generated images can look disjointed or unnatural, lacking the smooth progression you'd see in a real painting.
What's the solution?
The researchers developed a system that uses artificial intelligence, specifically something called a diffusion model, to generate painting processes. They trained this AI on a large collection of real painting videos. A key part of their approach is a way to control the style of the painting and ensure that the textures and details evolve realistically over time, even when switching between different mediums. They also used a special training technique to make the AI generate steps that look like a human artist would actually take.
Why it matters?
This work is important because it could lead to personalized and interactive painting tutorials that adapt to your learning style. It also pushes the boundaries of what AI can do in terms of understanding and replicating complex creative processes, potentially helping artists explore new ideas or automating parts of their workflow.
Abstract
Step-by-step painting tutorials are vital for learning artistic techniques, but existing video resources (e.g., YouTube) lack interactivity and personalization. While recent generative models have advanced artistic image synthesis, they struggle to generalize across media and often show temporal or structural inconsistencies, hindering faithful reproduction of human creative workflows. To address this, we propose a unified framework for multi-media painting process generation with a semantics-driven style control mechanism that embeds multiple media into a diffusion models conditional space and uses cross-medium style augmentation. This enables consistent texture evolution and process transfer across styles. A reverse-painting training strategy further ensures smooth, human-aligned generation. We also build a large-scale dataset of real painting processes and evaluate cross-media consistency, temporal coherence, and final-image fidelity, achieving strong results on LPIPS, DINO, and CLIP metrics. Finally, our Perceptual Distance Profile (PDP) curve quantitatively models the creative sequence, i.e., composition, color blocking, and detail refinement, mirroring human artistic progression.