AION-1: Omnimodal Foundation Model for Astronomical Sciences

Liam Parker, Francois Lanusse, Jeff Shen, Ollie Liu, Tom Hehir, Leopoldo Sarra, Lucas Meyer, Micah Bowles, Sebastian Wagner-Carena, Helen Qu, Siavash Golkar, Alberto Bietti, Hatim Bourfoune, Nathan Casserau, Pierre Cornette, Keiya Hirashima, Geraud Krawezik, Ruben Ohana, Nicholas Lourie, Michael McCabe, Rudy Morel, Payel Mukhopadhyay

2025-10-21

AION-1: Omnimodal Foundation Model for Astronomical Sciences

Summary

This paper introduces AION-1, a new set of powerful artificial intelligence models designed specifically for astronomy, capable of working with different types of astronomical data all at once.

What's the problem?

Astronomy collects data in many different forms – images, spectra, and simple numbers – and there hasn't been a good way to combine all this information into a single, unified model that can learn from everything at the same time. Existing AI models often focus on just one type of data, missing out on important connections.

What's the solution?

The researchers created AION-1, which uses a two-step process. First, each type of data (images, spectra, etc.) is converted into a format the model can understand. Then, a powerful type of AI called a 'transformer' learns relationships *between* these different data types. They trained this model on data from five major astronomical surveys, covering over 200 million celestial objects. Importantly, they’ve released different versions of the model, ranging in size, and all the code and data needed to use it.

Why it matters?

AION-1 is important because it provides a single model that can perform many different astronomy tasks, like figuring out the properties of galaxies or identifying different types of stars, all from a variety of data sources. It also serves as a model for how to build similar AI systems for *other* scientific fields that deal with diverse and complex data, and the open-source release allows other scientists to build upon this work.

Abstract

While foundation models have shown promise across a variety of fields, astronomy still lacks a unified framework for joint modeling across its highly diverse data modalities. In this paper, we present AION-1, a family of large-scale multimodal foundation models for astronomy. AION-1 integrates heterogeneous imaging, spectroscopic, and scalar data using a two-stage architecture: modality-specific tokenization followed by transformer-based masked modeling of cross-modal token sequences. The model is pretrained on five large-scale surveys: Legacy Survey, Hyper Suprime-Cam (HSC), Sloan Digital Sky Survey (SDSS), Dark Energy Spectroscopic Instrument (DESI), and Gaia. These span more than 200 million observations of stars, galaxies, and quasars. With a single frozen encoder, AION-1 achieves strong results on a broad suite of downstream tasks, including galaxy and stellar property estimation, galaxy morphology classification, similarity-based retrieval, galaxy image segmentation, and spectral super-resolution. We release AION-1 model variants ranging from 300 M to 3.1 B parameters. Beyond astronomy, AION-1 provides a scalable blueprint for multimodal scientific foundation models that can seamlessly integrate noisy, instrument-specific observations. All code, tokenizers, pretrained weights, and a lightweight evaluation suite are released under an open-source license.

View Paper