SongCreator: Lyrics-based Universal Song Generation

Shun Lei, Yixuan Zhou, Boshi Tang, Max W. Y. Lam, Feng Liu, Hangyu Liu, Jingcheng Wu, Shiyin Kang, Zhiyong Wu, Helen Meng

2024-09-11

SongCreator: Lyrics-based Universal Song Generation

Summary

This paper talks about SongCreator, a system designed to generate complete songs, including both lyrics and music, based on given lyrics.

What's the problem?

Creating songs that include both vocals and instrumental accompaniment from just lyrics is a challenging task. Previous attempts at song generation have focused on parts of the process, like singing voice or instrumentals, but not on combining all these elements effectively. This limitation makes it hard to use music generation models in real-world applications.

What's the solution?

To address this challenge, the authors developed SongCreator, which features a dual-sequence language model (DSLM) that captures the information needed for both vocals and accompaniment. They also introduced an attention mask strategy that helps the model understand, generate, and edit songs more flexibly. Extensive testing showed that SongCreator performs exceptionally well across various song generation tasks, especially in converting lyrics into complete songs and vocals.

Why it matters?

This research is important because it enhances the ability of AI to create music, making it easier for artists and creators to generate high-quality songs. By improving how songs are generated from lyrics, SongCreator could revolutionize music production and open up new possibilities for creativity in the music industry.

Abstract

Music is an integral part of human culture, embodying human intelligence and creativity, of which songs compose an essential part. While various aspects of song generation have been explored by previous works, such as singing voice, vocal composition and instrumental arrangement, etc., generating songs with both vocals and accompaniment given lyrics remains a significant challenge, hindering the application of music generation models in the real world. In this light, we propose SongCreator, a song-generation system designed to tackle this challenge. The model features two novel designs: a meticulously designed dual-sequence language model (DSLM) to capture the information of vocals and accompaniment for song generation, and an additional attention mask strategy for DSLM, which allows our model to understand, generate and edit songs, making it suitable for various song-related generation tasks. Extensive experiments demonstrate the effectiveness of SongCreator by achieving state-of-the-art or competitive performances on all eight tasks. Notably, it surpasses previous works by a large margin in lyrics-to-song and lyrics-to-vocals. Additionally, it is able to independently control the acoustic conditions of the vocals and accompaniment in the generated song through different prompts, exhibiting its potential applicability. Our samples are available at https://songcreator.github.io/.

View Paper