UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting

Ziyi Wang, Yanran Zhang, Jie Zhou, Jiwen Lu

2025-06-15

UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal
Gaussian Splatting

Summary

This paper talks about UniPre3D, a new way to prepare AI models that work with 3D point clouds, which are collections of points that represent objects or scenes in three dimensions. UniPre3D uses special Gaussian shapes called primitives and mixes information from 2D images to help the model learn more effectively on different sizes of 3D data and for tasks involving both individual objects and whole scenes.

What's the problem?

The problem is that training AI models to understand 3D point clouds well is difficult because the data is very complex and comes in different sizes and types, like small objects or large scenes. Existing methods struggle to handle these different cases together and don't always make the best use of related 2D image features, which can help guide the learning for better results.

What's the solution?

The solution was to develop UniPre3D, which unifies the pre-training for 3D point clouds by using Gaussian splatting, a way to represent 3D points with smooth shapes, and by integrating features from 2D images. This combination helps the model learn better representations that work well for a variety of 3D tasks, whether looking at single objects or entire scenes, and it works with models of different sizes.

Why it matters?

This matters because better pre-training methods for 3D point clouds help AI become smarter at understanding and using 3D data, which is important for applications like virtual reality, robotics, and autonomous vehicles. By improving how models learn from 3D data, UniPre3D makes it easier to develop AI systems that can see and interact with the 3D world more accurately and efficiently.

Abstract

UniPre3D is a unified pre-training method for 3D point clouds and models of any scale, using Gaussian primitives and 2D feature integration for effective performance across object and scene tasks.

View Paper