C-GenReg: Training-Free 3D Point Cloud Registration by Multi-View-Consistent Geometry-to-Image Generation with Probabilistic Modalities Fusion

Yuval Haitman, Amit Efraim, Joseph M. Francos

2026-04-23

C-GenReg: Training-Free 3D Point Cloud Registration by Multi-View-Consistent Geometry-to-Image Generation with Probabilistic Modalities Fusion

Summary

This paper introduces a new method, C-GenReg, for aligning 3D point clouds – basically, matching up two different 3D scans of the same scene. It does this without needing any extra training, by cleverly combining the strengths of existing AI models that are good at understanding both 3D shapes and 2D images.

What's the problem?

Current methods for aligning 3D point clouds often struggle when the scans are taken with different sensors, have different levels of detail, or are from different environments. They're not very good at generalizing to new situations. Imagine trying to match a detailed scan from a fancy laser scanner with a quick scan from a phone – it's hard for computers to see they're the same thing.

What's the solution?

C-GenReg solves this by taking the 3D point cloud data and using an AI model to *create* realistic 2D images from different viewpoints. Then, it uses another AI model, one that's already really good at finding matching points in images, to find correspondences between these generated images. Finally, it translates those 2D matches back into 3D, combining them with the direct 3D matches to get a more accurate and reliable alignment. It's like getting a second opinion from a different type of expert.

Why it matters?

This work is important because it allows for accurate 3D alignment even when you don't have a lot of training data or when the scans are very different. It's the first method to successfully align outdoor LiDAR scans (used in self-driving cars) without needing any actual images, which is a big step forward for applications like mapping and robotics.

Abstract

We introduce C-GenReg, a training-free framework for 3D point cloud registration that leverages the complementary strengths of world-scale generative priors and registration-oriented Vision Foundation Models (VFMs). Current learning-based 3D point cloud registration methods struggle to generalize across sensing modalities, sampling differences, and environments. Hence, C-GenReg augments the geometric point cloud registration branch by transferring the matching problem into an auxiliary image domain, where VFMs excel, using a World Foundation Model to synthesize multi-view-consistent RGB representations from the input geometry. This generative transfer, preserves spatial coherence across source and target views without any fine-tuning. From these generated views, a VFM pretrained for finding dense correspondences extracts matches. The resulting pixel correspondences are lifted back to 3D via the original depth maps. To further enhance robustness, we introduce a "Match-then-Fuse" probabilistic cold-fusion scheme that combines two independent correspondence posteriors, that of the generated-RGB branch with that of the raw geometric branch. This principled fusion preserves each modality inductive bias and provides calibrated confidence without any additional learning. C-GenReg is zero-shot and plug-and-play: all modules are pretrained and operate without fine-tuning. Extensive experiments on indoor (3DMatch, ScanNet) and outdoor (Waymo) benchmarks demonstrate strong zero-shot performance and superior cross-domain generalization. For the first time, we demonstrate a generative registration framework that operates successfully on real outdoor LiDAR data, where no imagery data is available.

View Paper