Diffusion-based generative model with scaffold-hopping strategy yields highly potent bioactive molecules
1 SMarT-Diff (Scaffold-based Multi-property Tuning Diffusion) is presented as a score-based diffusion framework for lead optimization that explicitly targets the classic tension between multi-objective property control and true scaffold-level exploration (scaffold hopping rather than close analog generation).
2 The most concrete outcome is wet-lab validation on LRRK2: three generated compounds were synthesized and tested (ADP-Glo Kinase Assay), and the best candidate lrrk2_m_1001 achieved IC50 = 1.544 nM, outperforming the positive control LRRK2-IN-1 (IC50 = 3.141 nM).
3 The core idea is to condition generation on Bemis–Murcko scaffolds as structural priors, while simultaneously guiding toward drug-likeness (QED), synthetic accessibility (SA), and pharmacophore matching; for CNS-relevant tasks, predicted BBB permeability is added as an extra objective.
4 Architecturally, SMarT-Diff uses a graph diffusion transformer (DiT) inside a score-based generative model (SGM), denoising unified graph tokens with adaptive layer normalization (AdaLN) to better encode topology and scaffold–substituent relationships.
5 Sampling is a two-level system: an inner Reverse Diffusion Predictor plus Adaptive Momentum Corrector (RA) improves stability and “chemotype fidelity,” while an outer Advantage Actor-Critic (A2C) loop steers sampling using pharmacophore-matching rewards and explicitly penalizes excessive scaffold similarity to stay in a scaffold-hopping regime.
6 Ablations on LRRK2 dissect the trade-offs: scaffold-graph conditioning restores high validity (to ~0.953) and boosts success rate; RA increases scaffold similarity (up to ~0.732), while A2C is used to pull similarity down toward the intended hopping window (centered near ~0.36–0.41) without collapsing diversity.
7 Scaffold-level out-of-distribution (OOD) generation is quantified at two abstraction levels: among 10,000 LRRK2 designs, 93.96% had Bemis–Murcko scaffolds unseen in training and 60.08% had novel generic scaffolds; nearest-neighbor scaffold similarity is centered around ~0.4 vs training, aligning with a practical scaffold-hopping definition.
8 Importantly, novelty is not treated as a pure exploration metric: within a filtered “drug-like and strong docking” subset (e.g., Glide SP < −8.0 kcal/mol, QED > 0.6, SA > 0.6), molecules with similarity < 0.5 still retain strong predicted binding (median Glide SP ≈ −8.57 kcal/mol), suggesting scaffold changes can preserve affinity.
9 Against scaffold-hopping baselines on LRRK2 (PMDM, DECOMPOPT, DRLinker, Tree-Invent, TurboHopp, DiffHopp), SMarT-Diff reports a balanced profile: novelty 1.000, validity 0.944, uniqueness 0.851, scaffold similarity 0.362 with diversity 0.749, top QED (0.640), strong SA (0.653), and the best success rate (0.629 with QED > 0.4 and SA > 0.6).
10 Beyond single-target kinase optimization (LRRK2, HPK1), the framework is shown to generalize to a GPCR (GLP-1R) without retraining and to dual-target design (GSK3β/JNK3) via MCS-mined shared cores plus pharmacophore matching, yielding candidates with favorable docking distributions and MM/GBSA support for dual-pocket stability.
📜Paper:
doi.org/10.1002/advs.75674
#DrugDiscovery #GenerativeAI #DiffusionModels #MolecularGeneration #ScaffoldHopping #LeadOptimization #ComputationalChemistry #Cheminformatics #ReinforcementLearning #KinaseInhibitors