2Xplat: Two Experts Are Better Than One Generalist

Hwasik Jeong, Seungryong Lee, Gyeongjin Kang, Seungkwon Yang, Xiangyu Sun, Seungtae Nam, Eunbyung Park

2026-03-25

2Xplat: Two Experts Are Better Than One Generalist

Summary

This paper introduces a new way to quickly create detailed 3D models from a bunch of 2D pictures, without needing to know the exact position of the camera when the pictures were taken.

What's the problem?

Current methods for creating these 3D models try to figure out the camera positions and build the 3D model *at the same time* within one big computer program. This 'all-in-one' approach can be limiting because it mixes up the tasks of understanding the scene's shape and how it looks, potentially making it harder to get really high-quality results.

What's the solution?

The researchers developed a system called 2Xplat that breaks this down into two steps. First, one part of the program focuses solely on figuring out where the camera was when each picture was taken. Then, that information is passed to a second part of the program that specializes in creating the 3D model itself, using what it knows about how things look. It's like having one expert for geometry and another for appearance.

Why it matters?

This new two-step approach works surprisingly well, learning faster and achieving results as good as, or even better than, methods that try to do everything at once. This suggests that building complex 3D systems from separate, specialized parts might be a better strategy than trying to create one giant program.

Abstract

Pose-free feed-forward 3D Gaussian Splatting (3DGS) has opened a new frontier for rapid 3D modeling, enabling high-quality Gaussian representations to be generated from uncalibrated multi-view images in a single forward pass. The dominant approach in this space adopts unified monolithic architectures, often built on geometry-centric 3D foundation models, to jointly estimate camera poses and synthesize 3DGS representations within a single network. While architecturally streamlined, such "all-in-one" designs may be suboptimal for high-fidelity 3DGS generation, as they entangle geometric reasoning and appearance modeling within a shared representation. In this work, we introduce 2Xplat, a pose-free feed-forward 3DGS framework based on a two-expert design that explicitly separates geometry estimation from Gaussian generation. A dedicated geometry expert first predicts camera poses, which are then explicitly passed to a powerful appearance expert that synthesizes 3D Gaussians. Despite its conceptual simplicity, being largely underexplored in prior works, the proposed approach proves highly effective. In fewer than 5K training iterations, the proposed two-experts pipeline substantially outperforms prior pose-free feed-forward 3DGS approaches and achieves performance on par with state-of-the-art posed methods. These results challenge the prevailing unified paradigm and suggest the potential advantages of modular design principles for complex 3D geometric estimation and appearance synthesis tasks.

View Paper