Voxify3D: Pixel Art Meets Volumetric Rendering

Yi-Chuan Huang, Jiewen Chan, Hao-Jen Chien, Yu-Lun Liu

2025-12-09

Voxify3D: Pixel Art Meets Volumetric Rendering

Summary

This paper introduces a new method called Voxify3D for automatically turning 3D models into voxel art, which is that blocky, retro style you often see in games like Minecraft.

What's the problem?

Creating voxel art from 3D models is surprisingly hard. Existing methods either make the voxel art too simple and lose important details, or they don't quite capture the specific look of real voxel art – things like limited color palettes and sharp, pixel-perfect edges. It's difficult to balance simplifying the shape while still making it look good and recognizable, especially when you're restricting the number of colors used.

What's the solution?

Voxify3D tackles this problem in two main steps. First, it uses a special kind of view that doesn't have perspective, like looking straight down on something, to make sure the 3D model lines up perfectly with the voxel grid. Second, it uses a technique that checks if the overall meaning of the 3D model is still clear in the voxel art, even after it's been simplified. It also uses a clever way to choose colors from a limited palette while still allowing the computer to adjust them during the process, making the final result look more like hand-crafted voxel art.

Why it matters?

This research is important because it makes it easier to automatically create voxel art. This could be useful for game developers who want to quickly generate assets, or for artists who want a tool to help them explore different voxel art styles. The method produces higher quality voxel art than previous approaches, and allows for control over how detailed or abstract the final image is, as well as the number of colors used.

Abstract

Voxel art is a distinctive stylization widely used in games and digital media, yet automated generation from 3D meshes remains challenging due to conflicting requirements of geometric abstraction, semantic preservation, and discrete color coherence. Existing methods either over-simplify geometry or fail to achieve the pixel-precise, palette-constrained aesthetics of voxel art. We introduce Voxify3D, a differentiable two-stage framework bridging 3D mesh optimization with 2D pixel art supervision. Our core innovation lies in the synergistic integration of three components: (1) orthographic pixel art supervision that eliminates perspective distortion for precise voxel-pixel alignment; (2) patch-based CLIP alignment that preserves semantics across discretization levels; (3) palette-constrained Gumbel-Softmax quantization enabling differentiable optimization over discrete color spaces with controllable palette strategies. This integration addresses fundamental challenges: semantic preservation under extreme discretization, pixel-art aesthetics through volumetric rendering, and end-to-end discrete optimization. Experiments show superior performance (37.12 CLIP-IQA, 77.90\% user preference) across diverse characters and controllable abstraction (2-8 colors, 20x-50x resolutions). Project page: https://yichuanh.github.io/Voxify-3D/

View Paper