🔥 Today we announce the Meta OMol25 Electronic Structures Dataset - 500 TB of molecular data in collaboration with
@mshuaibii and team at
@AIatMeta. We envision a future where researchers can rapidly design molecules and peptides to treat diseases, discover catalysts to revolutionize synthesis and manufacturing, identify the next electrolyte to store and transport energy to protect the grid, and more. But these breakthrough discoveries require data.
Data to train next-generation AI models and interatomic potentials. Data to push the boundaries of what's computationally possible in molecular chemistry and lead the world in AI for science. Data that captures the full complexity of chemical systems, from small organic molecules to massive biomolecular complexes.
The OMol25 Electronic Structures dataset includes the raw DFT outputs, electronic densities, wavefunctions, and molecular orbital information for over 4M million high-accuracy quantum chemical calculations. We see this as a transformative opportunity to develop higher quality partial charges, partial spins, and advanced electronic features to unlock the next generation of physics-informed ML models.
The Materials Data Facility is proud to make these data available via the Eagle cluster at ALCF through a high-performance Globus endpoint. Given the dataset's unprecedented scale, we're first releasing all output data for a 4M random OMol25 split, with the full multi-petabyte dataset following based on community engagement.
For this first release, the data are quite raw, and as-created by the Meta team. There's a significant opportunity for the community to build tools that simplify access to these data, allow data query and browsing, create databases of calculated properties and descriptors, and much more. We intend to work on these topics with all of you.
We can't wait to see what you can do with these data!
Access Details:
github.com/facebookresearch/…
Eagle was pioneered as the Petrel project, a new way to provide researchers access to high-quality, high-volume data by Ian Foster, Rachana Ananthakrishnan, Kyle Chard, Michael Papka, Rick Stevens, and others.
Globus.org provides core platform capabilities (auth, data transfer, workflow automation, and compute) to over 600k researchers.
Thanks to support from NIST and James Warren for making the MDF vision of vast troves of open data to fuel discovery possible.
@mshuaibii,
@zackulissi ,
@argonne,
@argonne_lcf