Heterogeneous Scientific Foundation Model Collaboration
Zihao Li, Jiaru Zou, Feihao Fang, Xuying Ning, Mengting Ai, Tianxin Wei, Sirui Chen, Xiyuan Yang, Jingrui He
2026-05-01
Summary
This paper introduces Eywa, a new system that lets AI agents work with more than just text, specifically focusing on using specialized AI models built for areas like science.
What's the problem?
Current AI agents, even the really good ones, mostly rely on understanding and generating language. This limits what they can do when dealing with real-world problems, especially in fields like science where there's a lot of data that *isn't* text – things like chemical structures, images of cells, or complex datasets. These fields often have their own powerful AI models, but they don't easily connect to the language-based agents.
What's the solution?
The researchers created Eywa, which acts like a translator between language-based AI and these specialized scientific AI models. Eywa adds a language-based 'brain' to these models, allowing the main AI agent to ask questions and guide the specialized model to make predictions or analyze data. It can either replace existing parts of an AI system or work alongside them, coordinating different AI models to tackle complex tasks. They also built a system called EywaOrchestra that dynamically decides which AI model is best for each step of a problem.
Why it matters?
This work is important because it expands the capabilities of AI agents beyond just language. By allowing them to effectively use specialized AI models, Eywa can solve more complex problems in science and other fields where non-text data is crucial, and it reduces the need for everything to be explained in words.
Abstract
Agentic large language model systems have demonstrated strong capabilities. However, their reliance on language as the universal interface fundamentally limits their applicability to many real-world problems, especially in scientific domains where domain-specific foundation models have been developed to address specialized tasks beyond natural language. In this work, we introduce Eywa, a heterogeneous agentic framework designed to extend language-centric systems to a broader class of scientific foundation models. The key idea of Eywa is to augment domain-specific foundation models with a language-model-based reasoning interface, enabling language models to guide inference over non-linguistic data modalities. This design allows predictive foundation models, which are typically optimized for specialized data and tasks, to participate in higher-level reasoning and decision-making processes within agentic systems. Eywa can serve as a drop-in replacement for a single-agent pipeline (EywaAgent) or be integrated into existing multi-agent systems by replacing traditional agents with specialized agents (EywaMAS). We further investigate a planning-based orchestration framework in which a planner dynamically coordinates traditional agents and Eywa agents to solve complex tasks across heterogeneous data modalities (EywaOrchestra). We evaluate Eywa across a diverse set of scientific domains spanning physical, life, and social sciences. Experimental results demonstrate that Eywa improves performance on tasks involving structured and domain-specific data, while reducing reliance on language-based reasoning through effective collaboration with specialized foundation models.