CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM

Jingwei Xu, Chenyu Wang, Zibo Zhao, Wen Liu, Yi Ma, Shenghua Gao

2024-11-11

CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM

Summary

This paper introduces CAD-MLLM, a new system designed to generate Computer-Aided Design (CAD) models based on various user inputs, such as text descriptions, images, and point clouds.

What's the problem?

Creating CAD models can be complicated and time-consuming, especially when designers have to input information in multiple formats. Existing methods often struggle to combine these different types of data effectively, leading to inefficiencies and limitations in the CAD generation process.

What's the solution?

CAD-MLLM addresses this problem by using a unified approach that allows users to input different types of information (like text, images, and point clouds) to generate detailed CAD models. The system employs a large language model (LLM) to align the different data types and uses a new dataset called Omni-CAD, which includes around 450,000 instances of CAD models along with their corresponding descriptions and command sequences. This enables the system to create high-quality models quickly and efficiently.

Why it matters?

This research is significant because it enhances the way CAD models are generated, making it easier for designers and engineers to create complex designs from various inputs. By improving the efficiency and flexibility of CAD generation, CAD-MLLM can support advancements in fields like architecture, engineering, and product design.

Abstract

This paper aims to design a unified Computer-Aided Design (CAD) generation system that can easily generate CAD models based on the user's inputs in the form of textual description, images, point clouds, or even a combination of them. Towards this goal, we introduce the CAD-MLLM, the first system capable of generating parametric CAD models conditioned on the multimodal input. Specifically, within the CAD-MLLM framework, we leverage the command sequences of CAD models and then employ advanced large language models (LLMs) to align the feature space across these diverse multi-modalities data and CAD models' vectorized representations. To facilitate the model training, we design a comprehensive data construction and annotation pipeline that equips each CAD model with corresponding multimodal data. Our resulting dataset, named Omni-CAD, is the first multimodal CAD dataset that contains textual description, multi-view images, points, and command sequence for each CAD model. It contains approximately 450K instances and their CAD construction sequences. To thoroughly evaluate the quality of our generated CAD models, we go beyond current evaluation metrics that focus on reconstruction quality by introducing additional metrics that assess topology quality and surface enclosure extent. Extensive experimental results demonstrate that CAD-MLLM significantly outperforms existing conditional generative methods and remains highly robust to noises and missing points. The project page and more visualizations can be found at: https://cad-mllm.github.io/

View Paper