MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh

Shuangkang Fang, I-Chao Shen, Yufeng Wang, Yi-Hsuan Tsai, Yi Yang, Shuchang Zhou, Wenrui Ding, Takeo Igarashi, Ming-Hsuan Yang

2025-08-11

MeshLLM: Empowering Large Language Models to Progressively Understand
and Generate 3D Mesh

Summary

This paper talks about MeshLLM, a new method that helps large language models understand and create 3D shapes called meshes by breaking these shapes into smaller meaningful parts and training the model to put them back together correctly.

What's the problem?

The problem is that 3D meshes, which are complex shapes made of points and faces, are hard for language models to process because they are usually turned into long, complicated text sequences that lose important 3D structural details. Also, previous datasets were too small to effectively train these models.

What's the solution?

The paper presents a way to split 3D meshes into smaller meaningful pieces, called Primitive-Meshes, and builds a huge dataset with over 1.5 million samples. They also train the model to understand how the parts connect locally, improving its ability to capture the overall 3D shape and structure. This helps the model learn better and generate higher quality 3D shapes than earlier methods.

Why it matters?

This matters because it lets AI models work better with 3D objects, which is useful for things like gaming, virtual reality, animation, and design. By improving how AI understands and generates 3D meshes, it makes creating and editing 3D content easier and more efficient.

Abstract

MeshLLM uses large language models to generate and understand text-serialized 3D meshes by decomposing them into meaningful subunits and training with local mesh assembly strategies.

View Paper