Revisiting Target-Aware de novo Molecular Generation with TarPass: Between Rational Design and Texas Sharpshooter
1. The paper argues that many “target-aware” de novo generators may not truly use target information, but instead risk a Texas Sharpshooter pattern: retrospectively rationalizing outputs using coarse metrics (e.g., docking) and cherry-picked examples.
2. To address this, the authors introduce TarPass, a curated benchmark designed for fair, target-grounded evaluation across paradigms. It includes 18 well-studied, pharmaceutically relevant targets (20 structures total), expert-annotated key interactions, and ~1000 experimentally validated actives per target (from BindingDB), plus a ChEMBL-random baseline to test whether models beat “just sample from a drug-like database.”
3. TarPass is explicitly built to reduce data leakage: targets are time-split (post-2019) and selected to avoid overlap with common structure–ligand training sets (CrossDocked2020, PDBbind). The benchmark frames generalization realistically as “within druggable families” (e.g., kinases) rather than assuming entirely novel folds.
4. The evaluation is holistic and standardized: generate up to 1000 unique molecules/target, run a consistent docking workflow (with special handling for 3D in situ initial poses), then score both protein–ligand interactions (PLIs) and molecular plausibility (validity, drug-likeness, synthesizability, structural alerts, and chemical-distance behavior).
5. 15 representative methods are benchmarked across three paradigms: non-3D (DeepBlock, DRAGONFLY, SimpleSBDD, TamGen), 3D in situ (DiffSBDD, DrugFlow, IPDiff, Lingo3DMol, MolCraft, PocketFlow, SurfGen, TargetDiff), and optimization-based variants (DrugFlow-PA, MolPilot, REINVENT). The study also reports practical deployability: runtime, validity, uniqueness, and input-structure compatibility.
6. Key PLI finding: 3D in situ methods show only a modest average advantage in docking/interaction metrics, and many do not significantly outperform the ChEMBL-random baseline across targets. Only a small subset of methods shows consistent gains, and even then performance can be sensitive to conditions like reliance on an input ligand (raising concerns about robustness/generalization).
7. Interaction recovery is used as a stricter test than docking score alone. Even reference ligands achieve only ~51% exact match (limited by docking/PLIP constraints), but most models perform near random on exact match and match ratio; only a few (notably including DrugFlow/MolCraft and optimized variants) approach reference-like interaction recovery.
8. Pose realism remains a bottleneck for 3D in situ generation: initial conformations frequently contain steric clashes, centroid placement errors correlate strongly with reduced interaction recovery, and certain targets expose systematic failure modes (e.g., incomplete pocket definitions causing clashes; metal coordination such as Zn in HDAC6 being mishandled or unsupported by some models).
9. Plausibility/drug-likeness trade-off: non-3D models (often benefiting from broader pretraining) tend to generate more drug-like and synthesizable molecules (higher QED, better SA scores, fewer medicinal-chemistry alerts) but show weaker target specificity in PLIs. Many graph-based 3D in situ models overproduce implausible stereochemistry and overly complex ring systems (e.g., highly fused rings), harming synthetic feasibility.
10. The paper proposes a practical post-processing strategy: a multi-tier virtual screening workflow that applies hard filters across PLIs plausibility drug-likeness, followed by softer refinement (experience-based filters, optional clustering/MD). In case studies (JAK2/TYK2), hard filters reduce libraries to ~10% and later steps downscale to ~20–30 candidates, yielding some enrichment—but still highlighting that filtering cannot substitute for improving pose accuracy, interaction fidelity, and plausibility in the generators themselves.
📜Paper:
doi.org/10.1002/advs.75411
#ComputationalBiology #DrugDiscovery #GenerativeAI #MolecularGeneration #StructureBasedDrugDesign #Benchmarking #Docking #Cheminformatics #MachineLearning