< Explain other AI papers

The Well: a Large-Scale Collection of Diverse Physics Simulations for Machine Learning

Ruben Ohana, Michael McCabe, Lucas Meyer, Rudy Morel, Fruzsina J. Agocs, Miguel Beneitez, Marsha Berger, Blakesley Burkhart, Stuart B. Dalziel, Drummond B. Fielding, Daniel Fortunato, Jared A. Goldberg, Keiya Hirashima, Yan-Fei Jiang, Rich R. Kerswell, Suryanarayana Maddu, Jonah Miller, Payel Mukhopadhyay, Stefan S. Nixon, Jeff Shen, Romain Watteaux, Bruno Régaldo-Saint Blancard

2024-12-03

The Well: a Large-Scale Collection of Diverse Physics Simulations for Machine Learning

Summary

This paper introduces The Well, a large collection of diverse physics simulations designed to help researchers improve machine learning models for simulating physical systems.

What's the problem?

Many existing datasets for training machine learning models only cover a narrow range of physical behaviors, making it hard to test and evaluate new methods effectively. This limited variety can hinder advancements in research, as models need diverse data to learn from different scenarios accurately.

What's the solution?

The Well addresses this issue by providing a massive collection of 15 terabytes of data across 16 different datasets. These datasets include simulations of various physical phenomena, such as fluid dynamics and biological systems. The data is curated from experts and formatted for easy use with machine learning frameworks like PyTorch. This allows researchers to train and evaluate their models on a wide array of scenarios without needing to create their own datasets from scratch.

Why it matters?

This research is significant because it offers a comprehensive resource for scientists and engineers working on machine learning applications in physics. By providing diverse and high-quality simulation data, The Well can accelerate the development of more accurate and efficient models, ultimately enhancing our understanding of complex physical systems and improving technologies in fields like climate science, engineering, and astrophysics.

Abstract

Machine learning based surrogate models offer researchers powerful tools for accelerating simulation-based workflows. However, as standard datasets in this space often cover small classes of physical behavior, it can be difficult to evaluate the efficacy of new approaches. To address this gap, we introduce the Well: a large-scale collection of datasets containing numerical simulations of a wide variety of spatiotemporal physical systems. The Well draws from domain experts and numerical software developers to provide 15TB of data across 16 datasets covering diverse domains such as biological systems, fluid dynamics, acoustic scattering, as well as magneto-hydrodynamic simulations of extra-galactic fluids or supernova explosions. These datasets can be used individually or as part of a broader benchmark suite. To facilitate usage of the Well, we provide a unified PyTorch interface for training and evaluating models. We demonstrate the function of this library by introducing example baselines that highlight the new challenges posed by the complex dynamics of the Well. The code and data is available at https://github.com/PolymathicAI/the_well.