JAM, a flow-matching-based model, introduces word-level timing and duration control in song generation and uses Direct Preference Optimization to enhance audio quality, outperforming existing models in music-specific attributes.

This paper talks about JAM, a small and efficient AI model that can create songs with precise control over the timing and length of each word in the lyrics. It also improves the sound quality of the generated music.

JAM: A Tiny Flow-based Song Generator with Fine-grained Controllability and Aesthetic Alignment

Summary

What's the problem?

What's the solution?

Why it matters?

Abstract