OmniGen2 has made significant improvements over its predecessor, OmniGen 1.0, and has been fine-tuned to achieve state-of-the-art performance in open-source models. The model can process and flexibly combine diverse inputs, including humans, reference objects, and scenes, to produce novel and coherent visual outputs. However, it may take multiple attempts to achieve a satisfactory result, and the model sometimes does not follow instructions. To improve generation quality, users can provide high-quality images, be specific with instructions, and prioritize English prompts.
OmniGen2 requires an NVIDIA RTX 3090 or an equivalent GPU with approximately 17GB of VRAM to run natively. However, for devices with less VRAM, users can enable CPU Offload to run the model. The model's inference performance can be improved by decreasing the cfg_range_end parameter, which has a negligible impact on output quality. Additionally, OmniGen2 supports TeaCache and TaylorSeer for faster inference, and users can adjust key hyperparameters to achieve optimal results based on their specific use case.