By interleaving multimodal reasoning with an evolving contextual memory, VIGA can 'vibe code' scenes, their physics, and interactions, building them from scratch using primitives or high-quality generated assets.
Evaluated on the new BlenderBench benchmark with 30 challenging tasks and on BlenderGym, VIGA significantly outperforms strong baselines, showing robust generalization to diverse graphics editing and programmatic content creation tasks.


