Uplifting Table Tennis: A Robust, Real-World Application for 3D Trajectory and Spin Estimation

Daniel Kienzle, Katja Ludwig, Julian Lorenz, Shin'ichi Satoh, Rainer Lienhart

2025-11-26

Uplifting Table Tennis: A Robust, Real-World Application for 3D Trajectory and Spin Estimation

Summary

This paper focuses on figuring out the exact 3D path and spin of a ping pong ball just by watching a regular video, which is harder than it sounds!

What's the problem?

Current methods for tracking ping pong balls in 3D rely on training with computer-generated videos, but these don't work well with real-world videos because real videos are messy – the ball and table aren't always perfectly detected, and things aren't ideal. The biggest issue is that it's really hard to get accurate 3D data from real ping pong games to train these systems with.

What's the solution?

The researchers came up with a two-step process. First, they built a system to accurately identify the ball and table in the video using a large dataset of 2D video data they created. Then, they used a separate system, trained on perfect computer-generated data, to take that 2D information and estimate the 3D position and spin of the ball. They also made this second system more resilient to problems like missing ball detections or videos with inconsistent frame rates.

Why it matters?

This work is important because it makes it possible to accurately analyze ping pong games using just a standard video camera. This could be used for training players, analyzing professional matches, or even developing better robotic ping pong players, and it's a step towards making 3D motion tracking more reliable in real-world situations.

Abstract

Obtaining the precise 3D motion of a table tennis ball from standard monocular videos is a challenging problem, as existing methods trained on synthetic data struggle to generalize to the noisy, imperfect ball and table detections of the real world. This is primarily due to the inherent lack of 3D ground truth trajectories and spin annotations for real-world video. To overcome this, we propose a novel two-stage pipeline that divides the problem into a front-end perception task and a back-end 2D-to-3D uplifting task. This separation allows us to train the front-end components with abundant 2D supervision from our newly created TTHQ dataset, while the back-end uplifting network is trained exclusively on physically-correct synthetic data. We specifically re-engineer the uplifting model to be robust to common real-world artifacts, such as missing detections and varying frame rates. By integrating a ball detector and a table keypoint detector, our approach transforms a proof-of-concept uplifting method into a practical, robust, and high-performing end-to-end application for 3D table tennis trajectory and spin analysis.

View Paper