DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding
Jungbin Cho, Junwan Kim, Jisoo Kim, Minseo Kim, Mingu Kang, Sungeun Hong, Tae-Hyun Oh, Youngjae Yu
2024-12-02

Summary
This paper introduces DisCoRD, a new method that helps convert discrete motion tokens into smooth and continuous human motion, making it easier to generate realistic animations.
What's the problem?
Generating realistic human motion for animations is challenging because traditional methods often use discrete representations, which can lead to jerky or unnatural movements. These methods struggle with the fluidity of real human motion and can produce noise or artifacts in the animation. On the other hand, continuous methods provide smoother motions but are complex and require a lot of training data.
What's the solution?
DisCoRD addresses this issue by combining discrete motion tokens with a process called rectified flow decoding. This method allows the system to refine and smooth out the motion iteratively, capturing detailed movements and ensuring that the final animation looks natural. DisCoRD can work with any existing framework that uses discrete representations, enhancing the quality of motion without losing the original intent behind the movements.
Why it matters?
This research is important because it improves how we create animations, making them look more realistic and fluid. By bridging the gap between discrete and continuous motion representations, DisCoRD can be used in various fields such as video games, movies, and virtual reality, where high-quality animations are essential for engaging experiences.
Abstract
Human motion, inherently continuous and dynamic, presents significant challenges for generative models. Despite their dominance, discrete quantization methods, such as VQ-VAEs, suffer from inherent limitations, including restricted expressiveness and frame-wise noise artifacts. Continuous approaches, while producing smoother and more natural motions, often falter due to high-dimensional complexity and limited training data. To resolve this "discord" between discrete and continuous representations, we introduce DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding, a novel method that decodes discrete motion tokens into continuous motion through rectified flow. By employing an iterative refinement process in the continuous space, DisCoRD captures fine-grained dynamics and ensures smoother and more natural motions. Compatible with any discrete-based framework, our method enhances naturalness without compromising faithfulness to the conditioning signals. Extensive evaluations demonstrate that DisCoRD achieves state-of-the-art performance, with FID of 0.032 on HumanML3D and 0.169 on KIT-ML. These results solidify DisCoRD as a robust solution for bridging the divide between discrete efficiency and continuous realism. Our project page is available at: https://whwjdqls.github.io/discord.github.io/.