Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

Yicheng Zou, Dongsheng Zhu, Lin Zhu, Tong Zhu, Yunhua Zhou, Peiheng Zhou, Xinyu Zhou, Dongzhan Zhou, Zhiwang Zhou, Yuhao Zhou, Bowen Zhou, Zhanping Zhong, Zhijie Zhong, Haiteng Zhao, Penghao Zhao, Xiaomeng Zhao, Zhiyuan Zhao, Yechen Zhang, Jin Zhang, Wenwei Zhang, Hongjie Zhang, Zhuo Zhang

2026-03-28

Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

Summary

This paper introduces Intern-S1-Pro, a really large artificial intelligence model with one trillion parameters, designed to be good at both everyday tasks and specialized scientific work.

What's the problem?

Existing AI models often struggle to be excellent in both general knowledge *and* specific scientific fields. They either do well on broad tasks like understanding language or images, or they're focused on a single area of science, but rarely both at a high level. Building a model capable of handling both is incredibly difficult because of the sheer amount of data and computing power needed.

What's the solution?

The researchers created Intern-S1-Pro by scaling up the size of the model to an unprecedented one trillion parameters. They also developed tools, XTuner and LMDeploy, to make training and running such a massive model efficient and accurate. This allowed the AI to learn from a huge amount of data, improving its reasoning, image understanding, and ability to perform over 100 different scientific tasks in fields like chemistry and biology.

Why it matters?

Intern-S1-Pro is important because it represents a significant step forward in creating AI that can assist scientists and researchers. It's not just better at general tasks, but it actually outperforms other AI systems, even those that aren't publicly available, in specialized scientific areas. Being an 'open-source' model means other researchers can build upon this work, accelerating progress in many scientific fields.

Abstract

We introduce Intern-S1-Pro, the first one-trillion-parameter scientific multimodal foundation model. Scaling to this unprecedented size, the model delivers a comprehensive enhancement across both general and scientific domains. Beyond stronger reasoning and image-text understanding capabilities, its intelligence is augmented with advanced agent capabilities. Simultaneously, its scientific expertise has been vastly expanded to master over 100 specialized tasks across critical science fields, including chemistry, materials, life sciences, and earth sciences. Achieving this massive scale is made possible by the robust infrastructure support of XTuner and LMDeploy, which facilitates highly efficient Reinforcement Learning (RL) training at the 1-trillion parameter level while ensuring strict precision consistency between training and inference. By seamlessly integrating these advancements, Intern-S1-Pro further fortifies the fusion of general and specialized intelligence, working as a Specializable Generalist, demonstrating its position in the top tier of open-source models for general capabilities, while outperforming proprietary models in the depth of specialized scientific tasks.

View Paper