OmniAlpha: A Sequence-to-Sequence Framework for Unified Multi-Task RGBA Generation

Hao Yu, Jiabo Zhan, Zile Wang, Jinglin Wang, Huaisong Zhang, Hongyu Li, Xinrui Chen, Yongxian Wei, Chun Yuan

2025-11-26

OmniAlpha: A Sequence-to-Sequence Framework for Unified Multi-Task RGBA Generation

Summary

This paper introduces a new system called OmniAlpha that can generate and edit images with all four color channels – red, green, blue, and alpha (transparency). It’s designed to handle many different image tasks at once, unlike existing methods that usually focus on just one thing.

What's the problem?

Currently, creating images with transparency is tricky. Models that are good at making realistic colors (RGB) don’t usually handle transparency (alpha) well. And models *that* can handle transparency are often limited to doing only one specific task, like removing backgrounds. There's a need for a single model that can do a lot of different things with all four color channels.

What's the solution?

The researchers created OmniAlpha, a new model built on a powerful architecture called a Diffusion Transformer. A key innovation is a new way of processing information called MSRoPE-BiL, which allows the model to work with all four color channels simultaneously. To train this model, they also created a new dataset called AlphaLayers, containing high-quality images with transparency information. They then trained OmniAlpha on 21 different image editing tasks using this dataset.

Why it matters?

This work shows that it’s possible to build a single model that’s really good at handling all aspects of image generation and editing, including transparency. This is a big step forward because it means we can create more versatile and powerful image editing tools that can do a wider range of tasks without needing separate specialized models.

Abstract

Generative models have excelled in RGB synthesis, but real-world applications require RGBA manipulation. This has led to a fragmented landscape: specialized, single-task models handle alpha but lack versatility, while unified multi-task frameworks are confined to the RGB domain. To bridge this critical gap, we propose OmniAlpha, the first unified, multi-task generative framework for sequence-to-sequence RGBA image generation and editing. Its architecture features MSRoPE-BiL, a novel RoPE method with a bi-directionally extendable layer axis for its Diffusion Transformer (DiT) backbone, enabling the concurrent processing of multiple input and target RGBA layers. To power this framework, we introduce AlphaLayers, a new dataset of 1,000 high-quality, multi-layer triplets, built via a novel automated synthesis and filter pipeline. Jointly training OmniAlpha on this dataset across a comprehensive suite of 21 diverse tasks, extensive experiments demonstrate that our unified approach consistently outperforms strong, specialized baselines. Most notably, OmniAlpha achieves a dramatic 84.8% relative reduction in SAD for mask-free matting on AIM-500 and wins over 90% of human preferences in layer-conditioned completion. Our work proves that a unified, multi-task model can learn a superior shared representation for RGBA, paving the way for more powerful, layer-aware generative systems.

View Paper