Memorization in 3D Shape Generation: An Empirical Study
Shu Pu, Boya Zeng, Kaichen Zhou, Mengyu Wang, Zhuang Liu
2026-01-09
Summary
This paper investigates whether 3D generative models, which create new 3D shapes, are actually just copying the shapes they were trained on instead of truly *generating* new ones.
What's the problem?
As these 3D models become more popular, it's a concern that they might be simply memorizing the training data. If a model just memorizes, it can't create truly novel designs and could even accidentally reveal private data from the training set. We didn't have a good way to measure *how much* these models were memorizing, and what factors made it worse.
What's the solution?
The researchers created a method to measure memorization in these 3D models. They then used this method to test different models and settings. They found that memorization is affected by the type of 3D data used, how varied the data is, and details of how the model is built, like how strongly it's guided during creation. They also discovered that simple tricks like rotating the training data can help reduce memorization.
Why it matters?
Understanding and reducing memorization is important because it helps ensure these 3D generative models can create genuinely new and diverse designs, and it protects against potential privacy issues related to the training data. The strategies they found are easy to implement and can improve the quality and originality of generated 3D shapes.
Abstract
Generative models are increasingly used in 3D vision to synthesize novel shapes, yet it remains unclear whether their generation relies on memorizing training shapes. Understanding their memorization could help prevent training data leakage and improve the diversity of generated results. In this paper, we design an evaluation framework to quantify memorization in 3D generative models and study the influence of different data and modeling designs on memorization. We first apply our framework to quantify memorization in existing methods. Next, through controlled experiments with a latent vector-set (Vecset) diffusion model, we find that, on the data side, memorization depends on data modality, and increases with data diversity and finer-grained conditioning; on the modeling side, it peaks at a moderate guidance scale and can be mitigated by longer Vecsets and simple rotation augmentation. Together, our framework and analysis provide an empirical understanding of memorization in 3D generative models and suggest simple yet effective strategies to reduce it without degrading generation quality. Our code is available at https://github.com/zlab-princeton/3d_mem.