The model is designed to predict action-conditioned future visual observations across different embodiments. By using a shared 2D skeleton interface, OSCAR can reason over robots and hands in a common control representation instead of building a separate world model for each hardware form.
OSCAR is useful for robotics researchers who want a visual world model for policy evaluation, data processing, and embodiment-transfer experiments. The page links to an arXiv paper, GitHub code, and a Hugging Face model, making it practical for research reproduction.


