An open quantum chemistry property database of 120 kilo molecules with 20 million conformers
• The paper introduces “QO2Mol,” a large-scale quantum chemistry dataset consisting of 120,000 organic molecules and 20 million conformers, aiming to revolutionize research in computational chemistry and drug discovery.
• The dataset covers 10 elements (C, H, O, N, S, P, F, Cl, Br, I) and provides quantum mechanical properties calculated using high-precision B3LYP/def2-SVP methods. This allows comprehensive studies of structure-property relationships with real-world molecular relevance.
• One of the standout features is its focus on high precision, leveraging around 10 million CPU core-hours for calculations, which offers potential energy, forces, and additional attributes for better molecular behavior prediction.
• Compared to existing datasets like QM9 and ANI-1, QO2Mol boasts superior diversity in molecular structures and elements, allowing better applicability in various research fields such as drug discovery, material science, and AI model training.
• The dataset is accompanied by benchmark codes and scripts, enabling easy integration for researchers aiming to develop or enhance AI models for molecular prediction tasks.
• QO2Mol is designed to fill gaps in existing datasets, which either lack elemental diversity or are limited to low heavy atom counts, thus promoting more reliable and realistic modeling of organic molecules for various applications.
• Benchmarking tests show that models like GemNet achieve the lowest prediction error on potential energy tasks, demonstrating the dataset’s potential in training high-performing models.
• This dataset holds promise for significantly advancing quantum chemistry, enhancing machine learning model accuracy, and fostering new drug discoveries with a focus on both equilibrium and near-equilibrium conformers.
💻Code:
github.com/saiscn/QO2Mol
📜Paper:
arxiv.org/abs/2410.19316
#quantumchemistry #computationalchemistry #machinelearning #drugdiscovery #QO2Mol #opendataset #chemistry #materialscience