Orient Anything V2: Unifying Orientation and Rotation Understanding
Zehan Wang, Ziang Zhang, Jiayang Xu, Jialei Wang, Tianyu Pang, Chao Du, HengShuang Zhao, Zhou Zhao
2026-01-12
Summary
This paper introduces Orient Anything V2, a new and improved computer vision model that can figure out the 3D orientation and rotation of objects just by looking at pictures, even if it's only one picture or a pair of them.
What's the problem?
Existing methods struggled with accurately determining an object's orientation, especially for objects that look the same even when rotated – think of a square or a cylinder. It was also hard for models to understand *relative* rotations, like how much an object turned compared to its starting position, and they needed a lot of specifically labeled data to work well.
What's the solution?
The researchers tackled this by creating a system that uses a lot of computer-generated 3D models to train the AI. They also developed a clever way to automatically label the 'front' of objects, even if an object has multiple valid fronts due to symmetry. The model learns to account for these symmetries and directly predicts how objects rotate from one view to another, using information from multiple images if available.
Why it matters?
This work is important because it makes orientation estimation much more accurate and versatile. Orient Anything V2 performs better than previous methods on a variety of tests and can be used in many applications, like robotics, augmented reality, and even understanding how objects are arranged in images or videos. It's a big step towards computers 'seeing' and understanding the 3D world like humans do.
Abstract
This work presents Orient Anything V2, an enhanced foundation model for unified understanding of object 3D orientation and rotation from single or paired images. Building upon Orient Anything V1, which defines orientation via a single unique front face, V2 extends this capability to handle objects with diverse rotational symmetries and directly estimate relative rotations. These improvements are enabled by four key innovations: 1) Scalable 3D assets synthesized by generative models, ensuring broad category coverage and balanced data distribution; 2) An efficient, model-in-the-loop annotation system that robustly identifies 0 to N valid front faces for each object; 3) A symmetry-aware, periodic distribution fitting objective that captures all plausible front-facing orientations, effectively modeling object rotational symmetry; 4) A multi-frame architecture that directly predicts relative object rotations. Extensive experiments show that Orient Anything V2 achieves state-of-the-art zero-shot performance on orientation estimation, 6DoF pose estimation, and object symmetry recognition across 11 widely used benchmarks. The model demonstrates strong generalization, significantly broadening the applicability of orientation estimation in diverse downstream tasks.