P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics Olympiads

Yun Luo, Futing Wang, Qianjia Cheng, Fangchen Yu, Haodi Lei, Jianhao Yan, Chenxi Li, Jiacheng Chen, Yufeng Zhao, Haiyuan Wan, Yuchen Zhang, Shenghe Zheng, Junchi Yao, Qingyang Zhang, Haonan He, Wenxuan Zeng, Li Sheng, Chengxing Xie, Yuxin Zuo, Yizhuo Li, Yulun Wu, Rui Huang

2026-02-11

P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics Olympiads

Summary

This paper introduces P1-VL, a new family of open-source computer programs designed to think like a scientist, specifically in physics. It aims to move beyond simply manipulating symbols to actually understanding and reasoning about the physical world, using both text and images.

What's the problem?

Current Large Language Models (LLMs) are good at working with words, but struggle to connect that knowledge to the real world, especially when dealing with visual information. In physics problems, diagrams aren't just pictures – they contain crucial information that isn't written in the text, like specific measurements or symmetries. LLMs need to be able to 'see' and understand these diagrams to solve complex physics problems accurately.

What's the solution?

The researchers created P1-VL, which combines two key techniques. First, they used 'Curriculum Reinforcement Learning,' which means the program learns gradually, starting with easier problems and moving to harder ones. Second, they used 'Agentic Augmentation,' allowing the program to check its own work and refine its answers. This system was tested on challenging physics exams and achieved top performance among open-source programs, even rivaling some of the best commercially available models.

Why it matters?

This work is important because it's a step towards creating AI that can truly understand and reason about the physical world. By making P1-VL open-source, the researchers are allowing others to build upon this foundation, potentially leading to breakthroughs in scientific discovery and more intelligent AI systems that can interact with and learn from the world around us.

Abstract

The transition from symbolic manipulation to science-grade reasoning represents a pivotal frontier for Large Language Models (LLMs), with physics serving as the critical test anchor for binding abstract logic to physical reality. Physics demands that a model maintain physical consistency with the laws governing the universe, a task that fundamentally requires multimodal perception to ground abstract logic in reality. At the Olympiad level, diagrams are often constitutive rather than illustrative, containing essential constraints, such as boundary conditions and spatial symmetries, that are absent from the text. To bridge this visual-logical gap, we introduce P1-VL, a family of open-source vision-language models engineered for advanced scientific reasoning. Our method harmonizes Curriculum Reinforcement Learning, which employs progressive difficulty expansion to stabilize post-training, with Agentic Augmentation, enabling iterative self-verification at inference. Evaluated on HiPhO, a rigorous benchmark of 13 exams from 2024-2025, our flagship P1-VL-235B-A22B becomes the first open-source Vision-Language Model (VLM) to secure 12 gold medals and achieves the state-of-the-art performance in the open-source models. Our agent-augmented system achieves the No.2 overall rank globally, trailing only Gemini-3-Pro. Beyond physics, P1-VL demonstrates remarkable scientific reasoning capacity and generalizability, establishing significant leads over base models in STEM benchmarks. By open-sourcing P1-VL, we provide a foundational step toward general-purpose physical intelligence to better align visual perceptions with abstract physical laws for machine scientific discovery.

View Paper