FürElise: Capturing and Physically Synthesizing Hand Motions of Piano Performance

Ruocheng Wang, Pei Xu, Haochen Shi, Elizabeth Schumann, C. Karen Liu

2024-10-10

FürElise: Capturing and Physically Synthesizing Hand Motions of Piano Performance

Summary

This paper presents FürElise, a new method for capturing and synthesizing the complex hand motions of piano players to create realistic animations.

What's the problem?

Piano playing requires highly skilled hand movements that are difficult to replicate in animations. Existing methods often fail to accurately capture the intricate motions involved in playing the piano, which limits their use in applications like character animation, virtual reality, and robotics.

What's the solution?

To solve this problem, the authors created a large dataset called FürElise, which includes about 10 hours of 3D hand motion data from 15 expert pianists performing 153 classical pieces. They used a special setup that captures hand movements through multiple camera angles without needing markers on the hands. The data is then refined using advanced techniques that combine motion tracking with detailed audio information from a high-tech piano. The authors also developed a system that can generate realistic hand motions for new pieces of music by using a combination of machine learning methods, including imitation learning and reinforcement learning.

Why it matters?

This research is important because it advances the ability to create lifelike animations of piano playing, which can be used in various fields such as animation, gaming, and music education. By providing a way to synthesize realistic hand movements, this work can enhance user experiences in virtual environments and improve training tools for aspiring pianists.

Abstract

Piano playing requires agile, precise, and coordinated hand control that stretches the limits of dexterity. Hand motion models with the sophistication to accurately recreate piano playing have a wide range of applications in character animation, embodied AI, biomechanics, and VR/AR. In this paper, we construct a first-of-its-kind large-scale dataset that contains approximately 10 hours of 3D hand motion and audio from 15 elite-level pianists playing 153 pieces of classical music. To capture natural performances, we designed a markerless setup in which motions are reconstructed from multi-view videos using state-of-the-art pose estimation models. The motion data is further refined via inverse kinematics using the high-resolution MIDI key-pressing data obtained from sensors in a specialized Yamaha Disklavier piano. Leveraging the collected dataset, we developed a pipeline that can synthesize physically-plausible hand motions for musical scores outside of the dataset. Our approach employs a combination of imitation learning and reinforcement learning to obtain policies for physics-based bimanual control involving the interaction between hands and piano keys. To solve the sampling efficiency problem with the large motion dataset, we use a diffusion model to generate natural reference motions, which provide high-level trajectory and fingering (finger order and placement) information. However, the generated reference motion alone does not provide sufficient accuracy for piano performance modeling. We then further augmented the data by using musical similarity to retrieve similar motions from the captured dataset to boost the precision of the RL policy. With the proposed method, our model generates natural, dexterous motions that generalize to music from outside the training dataset.

View Paper