PixelSmile: Toward Fine-Grained Facial Expression Editing
Jiabin Hua, Hengyuan Xu, Aojie Li, Wei Cheng, Gang Yu, Xingjun Ma, Yu-Gang Jiang
2026-03-28
Summary
This paper focuses on making it easier to realistically change facial expressions in images, going beyond simple smiles or frowns to more nuanced and continuous changes.
What's the problem?
Currently, when trying to edit facial expressions in images, the different expressions tend to blend together and aren't clearly defined. It's hard to isolate and control specific aspects of an expression without messing up the person's overall face or making the changes look unnatural. Existing methods struggle with both accurately editing the expression *and* keeping the person's identity consistent throughout the change.
What's the solution?
The researchers created a new dataset called FFE with detailed labels for a wide range of expressions, and a testing framework called FFE-Bench to measure how well different methods perform. They then developed a new technique called PixelSmile, which uses a type of artificial intelligence called a diffusion model. PixelSmile is trained in a special way to separate the different components of an expression, making them more distinct. It also uses a combination of guiding the AI with how strong an expression should be and making sure different expressions are clearly different from each other, allowing for precise and smooth control over the expression editing process.
Why it matters?
This work is important because it allows for much more realistic and controllable facial expression editing. This has potential applications in areas like creating more expressive avatars, improving special effects in movies, and even helping people with communication difficulties by allowing them to better express their emotions digitally.
Abstract
Fine-grained facial expression editing has long been limited by intrinsic semantic overlap. To address this, we construct the Flex Facial Expression (FFE) dataset with continuous affective annotations and establish FFE-Bench to evaluate structural confusion, editing accuracy, linear controllability, and the trade-off between expression editing and identity preservation. We propose PixelSmile, a diffusion framework that disentangles expression semantics via fully symmetric joint training. PixelSmile combines intensity supervision with contrastive learning to produce stronger and more distinguishable expressions, achieving precise and stable linear expression control through textual latent interpolation. Extensive experiments demonstrate that PixelSmile achieves superior disentanglement and robust identity preservation, confirming its effectiveness for continuous, controllable, and fine-grained expression editing, while naturally supporting smooth expression blending.