cadrille: Multi-modal CAD Reconstruction with Online Reinforcement Learning
Maksim Kolodiazhnyi, Denis Tarasov, Dmitrii Zhemchuzhnikov, Alexander Nikulin, Ilya Zisman, Anna Vorontsova, Anton Konushin, Vladislav Kurenkov, Danila Rukhovich
2025-05-30

Summary
This paper talks about cadrille, a new AI system that can rebuild detailed 3D models of objects for computer-aided design (CAD) by learning from both pictures and text, and by improving itself through trial and error.
What's the problem?
The problem is that creating accurate 3D CAD models from real-world data is really hard, especially when you only have a mix of images and written descriptions to work with. Traditional methods often struggle to handle this kind of mixed information and don't always work well on real-world examples.
What's the solution?
The researchers combined vision-language models, which understand both images and text, with reinforcement learning, where the AI learns by getting feedback on its attempts. This approach lets the system get better at building 3D models over time and perform well even on challenging, real-world datasets.
Why it matters?
This is important because it makes it much easier and faster to create accurate 3D models for things like engineering, architecture, and manufacturing, even when starting from messy or incomplete information. It could help professionals save time and make better designs.
Abstract
A multi-modal CAD reconstruction model leveraging vision-language models and reinforcement learning achieves state-of-the-art performance across various datasets, including real-world ones.