Seed-Music: A Unified Framework for High Quality and Controlled Music Generation

Ye Bai, Haonan Chen, Jitong Chen, Zhuo Chen, Yi Deng, Xiaohong Dong, Lamtharn Hantrakul, Weituo Hao, Qingqing Huang, Zhongyi Huang, Dongya Jia, Feihu La, Duc Le, Bochen Li, Chumin Li, Hui Li, Xingxing Li, Shouda Liu, Wei-Tsung Lu, Yiqing Lu, Andrew Shaw, Janne Spijkervet

2024-09-17

Seed-Music: A Unified Framework for High Quality and Controlled Music Generation

Summary

This paper introduces Seed-Music, a comprehensive system designed for generating high-quality music with precise control over its style and content.

What's the problem?

Creating music that meets specific styles or requirements can be challenging, especially for those who may not have extensive musical training. Traditional music generation methods often lack the flexibility to control various aspects of the music, such as vocal performance or melody adjustments.

What's the solution?

Seed-Music addresses this issue by combining two advanced techniques: auto-regressive language modeling and diffusion approaches. This allows users to generate vocal music based on multiple inputs, including style descriptions, audio references, musical scores, and voice prompts. Additionally, it provides interactive tools for editing lyrics and melodies directly in the generated audio, making it easier for users to create the music they envision.

Why it matters?

This research is significant because it democratizes music creation, allowing more people to produce high-quality music tailored to their preferences. By enhancing the ability to control music generation and editing, Seed-Music opens up new opportunities for artists, content creators, and anyone interested in music production.

Abstract

We introduce Seed-Music, a suite of music generation systems capable of producing high-quality music with fine-grained style control. Our unified framework leverages both auto-regressive language modeling and diffusion approaches to support two key music creation workflows: controlled music generation and post-production editing. For controlled music generation, our system enables vocal music generation with performance controls from multi-modal inputs, including style descriptions, audio references, musical scores, and voice prompts. For post-production editing, it offers interactive tools for editing lyrics and vocal melodies directly in the generated audio. We encourage readers to listen to demo audio examples at https://team.doubao.com/seed-music .

View Paper