Ghost-FWL: A Large-Scale Full-Waveform LiDAR Dataset for Ghost Detection and Removal
Kazuma Ikeda, Ryosei Hara, Rokuto Nagata, Ozora Sako. Zihao Ding, Takahiro Kado, Ibuki Fujioka, Taro Beppu, Mariko Isogawa, Kentaro Yoshioka
2026-04-01
Summary
This research focuses on improving the accuracy of LiDAR, a sensing technology used in self-driving cars and robotics, by tackling the issue of 'ghost points'. These are essentially false readings caused by light bouncing off surfaces like glass before reaching the sensor, which can throw off the system's understanding of the environment.
What's the problem?
LiDAR systems sometimes create incorrect 3D maps because of these 'ghost points'. Existing methods for removing these ghosts work well when you have a lot of data points close together, but they struggle with the sparser, more rapidly changing data that mobile LiDAR systems (like those on a moving car) collect. Basically, current ghost removal techniques aren't reliable when the environment is dynamic and the data isn't super dense.
What's the solution?
The researchers used a special type of LiDAR called 'full-waveform LiDAR' which records the entire signal of the laser pulse, not just the distance to an object. This extra information helps them distinguish between real reflections and 'ghost' reflections. They also created a large, new dataset called Ghost-FWL with tons of labeled ghost points to train and test their methods. Using this dataset, they developed a baseline ghost detection model and a more advanced model called FWL-MAE that learns from the LiDAR data itself, without needing as much manual labeling.
Why it matters?
This work is important because it significantly improves the reliability of LiDAR in real-world applications. By reducing ghost points, the system can create more accurate maps, better track its location (SLAM), and more reliably identify objects around it. The experiments showed substantial improvements in these areas, meaning safer and more effective self-driving cars and robots are closer to becoming a reality. Plus, they're sharing their dataset and code so others can build on their work.
Abstract
LiDAR has become an essential sensing modality in autonomous driving, robotics, and smart-city applications. However, ghost points (or ghosts), which are false reflections caused by multi-path laser returns from glass and reflective surfaces, severely degrade 3D mapping and localization accuracy. Prior ghost removal relies on geometric consistency in dense point clouds, failing on mobile LiDAR's sparse, dynamic data. We address this by exploiting full-waveform LiDAR (FWL), which captures complete temporal intensity profiles rather than just peak distances, providing crucial cues for distinguishing ghosts from genuine reflections in mobile scenarios. As this is a new task, we present Ghost-FWL, the first and largest annotated mobile FWL dataset for ghost detection and removal. Ghost-FWL comprises 24K frames across 10 diverse scenes with 7.5 billion peak-level annotations, which is 100x larger than existing annotated FWL datasets. Benefiting from this large-scale dataset, we establish a FWL-based baseline model for ghost detection and propose FWL-MAE, a masked autoencoder for efficient self-supervised representation learning on FWL data. Experiments show that our baseline outperforms existing methods in ghost removal accuracy, and our ghost removal further enhances downstream tasks such as LiDAR-based SLAM (66% trajectory error reduction) and 3D object detection (50x false positive reduction). The dataset and code is publicly available and can be accessed via the project page: https://keio-csg.github.io/Ghost-FWL