Open-Sora Plan: Open-Source Large Video Generation Model

Bin Lin, Yunyang Ge, Xinhua Cheng, Zongjian Li, Bin Zhu, Shaodong Wang, Xianyi He, Yang Ye, Shenghai Yuan, Liuhan Chen, Tanghui Jia, Junwu Zhang, Zhenyu Tang, Yatian Pang, Bin She, Cen Yan, Zhiheng Hu, Xiaoyi Dong, Lin Chen, Zhang Pan, Xing Zhou, Shaoling Dong

2024-12-03

Open-Sora Plan: Open-Source Large Video Generation Model

Summary

This paper introduces Open-Sora Plan, an open-source project that provides a large model for generating high-resolution videos based on user inputs.

What's the problem?

Creating high-quality videos that are long and detailed can be very challenging. Existing methods often struggle with generating videos that look realistic and maintain good quality over time, especially when they need to take various user instructions into account.

What's the solution?

Open-Sora Plan addresses these challenges by using a combination of advanced techniques, including a Wavelet-Flow Variational Autoencoder and a Joint Image-Video Skiparse Denoiser. These components work together to improve the video generation process. The project also includes strategies for efficient training and a system for collecting high-quality data. This allows the model to generate impressive videos that meet user specifications, whether they are based on text prompts or other forms of input.

Why it matters?

This research is significant because it makes it easier for people to create customized videos without needing extensive technical knowledge. By providing an open-source tool that anyone can use, Open-Sora Plan encourages innovation in video generation and can inspire new applications in areas like entertainment, education, and marketing.

Abstract

We introduce Open-Sora Plan, an open-source project that aims to contribute a large generation model for generating desired high-resolution videos with long durations based on various user inputs. Our project comprises multiple components for the entire video generation process, including a Wavelet-Flow Variational Autoencoder, a Joint Image-Video Skiparse Denoiser, and various condition controllers. Moreover, many assistant strategies for efficient training and inference are designed, and a multi-dimensional data curation pipeline is proposed for obtaining desired high-quality data. Benefiting from efficient thoughts, our Open-Sora Plan achieves impressive video generation results in both qualitative and quantitative evaluations. We hope our careful design and practical experience can inspire the video generation research community. All our codes and model weights are publicly available at https://github.com/PKU-YuanGroup/Open-Sora-Plan.

View Paper