Aesthetic Alignment Risks Assimilation: How Image Generation and Reward Models Reinforce Beauty Bias and Ideological "Censorship"
Wenqi Marshall Guo, Qingyun Qian, Khalad Hasan, Shan Du
2025-12-16
Summary
This paper investigates how image generation models, like those used to create pictures from text, tend to consistently produce images that are considered conventionally 'beautiful' even when the user specifically asks for something ugly or unusual.
What's the problem?
The core issue is that these models are trained to align with what developers *think* looks good, rather than what the *user* wants. This means if you ask for a 'bad' or 'low-quality' image for artistic reasons, the model will often ignore your request and still give you something aesthetically pleasing. It's like the model has its own idea of beauty and won't deviate from it, even when told to do so, limiting creative control and diverse artistic expression.
What's the solution?
Researchers created a dataset with a wide range of aesthetic qualities, from very beautiful to intentionally unattractive images. They then tested how well current image generation models and the systems that 'judge' image quality (reward models) responded to requests for different types of images. They found that both the generators and the reward models consistently penalized images that weren't conventionally attractive, even if those images perfectly matched the user's instructions. They also tested this with image editing and compared the results to real abstract art.
Why it matters?
This research is important because it highlights a bias in AI image generation that restricts user freedom and artistic possibilities. If models always prioritize a single aesthetic standard, it stifles creativity and doesn't allow for the full range of artistic expression. Recognizing and addressing this bias is crucial for building AI tools that truly empower users and respect their individual visions.
Abstract
Over-aligning image generation models to a generalized aesthetic preference conflicts with user intent, particularly when ``anti-aesthetic" outputs are requested for artistic or critical purposes. This adherence prioritizes developer-centered values, compromising user autonomy and aesthetic pluralism. We test this bias by constructing a wide-spectrum aesthetics dataset and evaluating state-of-the-art generation and reward models. We find that aesthetic-aligned generation models frequently default to conventionally beautiful outputs, failing to respect instructions for low-quality or negative imagery. Crucially, reward models penalize anti-aesthetic images even when they perfectly match the explicit user prompt. We confirm this systemic bias through image-to-image editing and evaluation against real abstract artworks.