Prox-E: Fine-Grained 3D Shape Editing via Primitive-Based Abstractions
Etai Sella, Hao Phung, Nitay Amiel, Or Litany, Or Patashnik, Hadar Averbuch-Elor
2026-05-04
Summary
This paper introduces a new method, called Prox-E, for editing 3D shapes using text instructions. It focuses on making precise, localized changes to 3D models without ruining the overall form.
What's the problem?
Current methods for editing 3D models often rely on first editing a 2D image of the model. While good for changing the appearance, these methods struggle when you need to make specific structural changes to the 3D shape itself, like adding a handle or changing the shape of a leg, while keeping the rest of the object looking the same. They often mess up the overall shape or don't follow the instructions accurately.
What's the solution?
Prox-E works by first breaking down the 3D model into simple geometric shapes, like cubes and spheres. Then, a powerful AI model that understands language (a VLM) is used to edit instructions for these basic shapes – for example, 'make the sphere bigger' or 'move the cube to the left'. Finally, another AI model uses these edited instructions to rebuild the 3D model, incorporating the changes while preserving the original shape as much as possible. Importantly, this method doesn't require any additional training; it uses pre-existing AI models.
Why it matters?
This research is important because it allows for much more precise and controlled editing of 3D models. This is useful for a lot of applications, like designing products, creating characters for games, or customizing objects. By focusing on structural edits and preserving the original shape, Prox-E offers a significant improvement over existing 3D editing techniques.
Abstract
Text-based 2D image editing models have recently reached an impressive level of maturity, motivating a growing body of work that heavily depends on these models to drive 3D edits. While effective for appearance-based modifications, such 2D-centric 3D editing pipelines often struggle with fine-grained 3D editing, where localized structural changes must be applied while strictly preserving an object's overall identity. To address this limitation, we propose Prox-E, a training-free framework that enables fine-grained 3D control through an explicit, primitive-based geometric abstraction. Our framework first abstracts an input 3D shape into a compact set of geometric primitives. A pretrained vision-language model (VLM) then edits this abstraction to specify primitive-level changes. These structural edits are subsequently used to guide a 3D generative model, enabling fine-grained, localized modifications while preserving unchanged regions of the original shape. Through extensive experiments, we demonstrate that our method consistently balances identity preservation, shape quality, and instruction fidelity more effectively than various existing approaches, including 2D-based 3D editors and training-based methods.