VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
Qianli Ma, Yaowei Zheng, Zhelun Shi, Zhongkai Zhao, Bin Jia, Ziyue Huang, Zhiqi Lin, Youjie Li, Jiacheng Yang, Yanghua Peng, Zhi Zhang, Xin Liu
2025-08-05
Summary
This paper talks about VeOmni, a new flexible and modular training system that helps build very large AI models that can work with different kinds of data like text, images, and audio all at once.
What's the problem?
The problem is that training these big AI models that handle many types of data is very difficult because current training systems mix the model design too closely with how the computer workload is split up, which makes it hard to scale up and manage.
What's the solution?
VeOmni solves this by separating the way the AI model communicates from how the training is done across many computers, using what they call model-centric distributed recipes. This allows for better combination of different training strategies and makes it easier to add new data types or model parts with less work.
Why it matters?
This matters because it makes training complex AI models faster, more efficient, and easier to customize, which helps researchers build smarter, more capable AI that can understand and generate information from multiple types of data.
Abstract
A modular training framework accelerates the development of omni-modal LLMs through efficient 3D parallelism and flexible configuration.