Bayesian Optimization in Chemical Compound Sub-spaces Using Low-dimensional Molecular Descriptors
1) This work presents a data-efficient Bayesian optimization framework that can identify optimal molecular structures with fewer than 2,000 training points in a chemical sub-space containing over 133,000 molecules.
2) The key innovation is a reliable inverse mapping scheme that translates optimized points in descriptor space back into chemically valid molecular structures, bridging the gap between continuous optimization and discrete molecular design.
3) The framework employs low-dimensional, physics-informed molecular descriptors that enable accurate Gaussian Process Regression even with limited training data, addressing the curse of dimensionality that plagues traditional molecular optimization.
4) For entropy optimization, the approach achieves a 100% success rate while requiring fewer than 1,000 molecular evaluations in more than 80% of test cases on the QM9 benchmark dataset.
5) For zero-point vibrational energy (ZPVE), the success rate exceeds 80% for molecules containing more than two heavy atoms, demonstrating robust performance across different molecular properties.
6) The inverse mapping algorithm predicts chemical formulas from descriptor vectors by matching predicted stoichiometry and shape characteristics against molecular databases, with a fallback penalty for chemically implausible suggestions.
7) The method outperforms conventional generative approaches that typically require large datasets, making it particularly suitable for data-scarce settings in molecular discovery.
8) The descriptors combine Coulomb matrix eigenvalues with inner products of atomic reference probability densities, capturing both global molecular shape and local atomic environment information.
📜Paper:
arxiv.org/abs/2603.02605
#BayesianOptimization #MolecularDesign #InverseDesign #GaussianProcess #QM9 #ChemicalSpace #LowDimensionalDescriptors #MolecularOptimization #ComputationalChemistry #MachineLearning