VIGA: Vision-as-Inverse-Graphics Agent

NEW

Key Features

Reconstructs images as editable 3D scene programs using Blender and an analysis-by-synthesis loop.
Supports asset creation from geometric primitives and integration of external 3D asset generators like Meshy and SAM-3D.
Uses interleaved multimodal reasoning with contextual memory to iteratively refine scenes, physics, and interactions.
Introduces BlenderBench, a 30-task benchmark where it achieves over 100% average improvement versus baselines.
Demonstrates strong performance and generalization on the BlenderGym graphics editing benchmark.

By interleaving multimodal reasoning with an evolving contextual memory, VIGA can 'vibe code' scenes, their physics, and interactions, building them from scratch using primitives or high-quality generated assets.


Evaluated on the new BlenderBench benchmark with 30 challenging tasks and on BlenderGym, VIGA significantly outperforms strong baselines, showing robust generalization to diverse graphics editing and programmatic content creation tasks.

Get more likes & reach the top of search results by adding this button on your site!

Embed button preview - Light theme
Embed button preview - Dark theme
TurboType Banner
Zero to AI Engineer Program

Zero to AI Engineer

Skip the degree. Learn real-world AI skills used by AI researchers and engineers. Get certified in 8 weeks or less. No experience required.

Subscribe to the AI Search Newsletter

Get top updates in AI to your inbox every weekend. It's free!