Evaluation of machine learning models for condition optimization in diverse amide coupling reactions
1. This study explores the application of machine learning in optimizing reaction conditions for amide coupling reactions, a crucial process in medicinal chemistry that accounts for over 40% of synthetic transformations. The research leverages the Open Reaction Database (ORD) to standardize and filter reaction data, training 13 different machine learning models to predict optimal reaction conditions and yields.
2. A key innovation is the use of various levels of structural data, from 1D SMILES strings to 2D Morgan Fingerprints and 3D XYZ coordinates, to enhance model performance. The study finds that while higher dimensionality significantly improves classification accuracy for coupling agents, its impact on yield prediction is more model-dependent.
3. The Extra Trees Classifier emerged as the highest-performing model for classifying coupling agents, achieving an accuracy of 0.873. For yield prediction, the Gradient Boosting Regressor showed the best performance with an R2 of 0.801. These results highlight the potential of ensemble-based models in handling complex reaction data.
4. The study also investigates the impact of different molecular properties on model performance. It concludes that local molecular environment features, such as those captured by XYZ coordinates and Morgan Fingerprints, are more relevant for prediction than bulk material properties like molecular weight and logP.
5. Validation on literature-reported data outside the ORD database revealed that the models struggled with substrates containing multiple competing amines and substituted benzoic acids. This suggests that the diversity and availability of data are critical factors affecting model accuracy.
6. The authors propose that increasing the availability of open-source reaction data could further improve model predictivity. They also suggest that this workflow could be scalable to other common reactions, such as Suzuki and Buchwald-Hartwig cross couplings, opening new avenues for machine learning in chemical synthesis.
📜Paper:
doi.org/10.26434/chemrxiv-20…
#MachineLearning #ChemicalSynthesis #AmideCoupling #ReactionOptimization #OpenSourceData