YuE features a dual LLaMA language model architecture, enabling it to handle complex, multi-segment song structures up to five minutes in length. Its innovative design incorporates a 'dual-track next-token prediction' strategy, which separately models vocal and accompaniment tracks to ensure high musical detail and consistency. The system also supports incremental song generation, batch processing, style transfer via audio prompts, and continuation of existing compositions. Users can experiment with different model settings, visualize their progress on an interactive timeline, and preview outputs before committing to full refinement. YuE is optimized for efficient performance, running on consumer-grade GPUs with quantized models, though high-end GPUs are recommended for best results.
YuE is released under the Apache 2.0 license, encouraging both academic research and commercial use with minimal restrictions. The model is compatible with popular deep learning frameworks and tools, and its open-source nature invites community contributions and ongoing improvements. With features like support for Chain of Thought and In-Context Learning modes, customizable inference parameters, and automatic language model checkpoint downloads, YuE is positioned as a flexible and powerful tool for the next generation of music creation. Its commitment to originality and avoidance of plagiarism ensures that generated songs are both unique and musically rich.