A hybrid physics-deep learning framework for combinatorial de novo design of small-molecule binding proteins
1. The paper introduces CLAIRE (CombinatoriaL Assembly with Integrated REfinement), a hybrid workflow that aims to make de novo small-molecule binder design more reliable by combining explicit physics-based interaction scaffolding with deep-learning-guided sequence/structure optimization.
2. Core idea: instead of asking generative models to “discover” atom-level protein–ligand hydrogen bonding after the fact, CLAIRE defines high-fidelity interaction motifs up front and then searches for backbones that can accommodate those motifs with tight geometric tolerances (distances within ~0.5 Å; angles/torsions within ~10–15°).
3. Motif innovation: the authors extend BSFF (Binding Sites from Fragments) by mining the PDB for residue–fragment interactions, spatially clustering them into discrete “interaction modes,” and preferentially using statistically overrepresented modes (not necessarily those best-scoring by Rosetta), capturing preferences like pi-stacking and cation–pi that energy functions may underweight.
4. Scaffold innovation: the authors generalize motif scaffolding beyond helical bundles by using LUCS to generate thousands of reshaped de novo NTF2-like alpha–beta scaffolds with finely varied pocket geometries, mimicking how nature reuses folds by subtle geometric shifts around functional sites.
5. Combinatorial matching: motifs and scaffolds are screened at scale using Rosetta Match; buried ligand placements are kept (≤30% ligand SASA exposed), yielding high matching throughput (reported as >160 buried matches per input motif across five diverse small molecules), enabling large libraries of candidate complexes.
6. Refinement step 1 (physics, targeted): HBRefine is introduced to fix a common failure mode in small-molecule binder design—buried unsatisfied polar atoms. It (a) mutates extraneous buried polar residues to hydrophobics when favorable and (b) proposes local mutations to create new H-bonds to any unsatisfied ligand polar atoms, then repacks and accepts changes if energetically non-worse.
7. Refinement step 2 (ML physics): ProteinMPNN redesigns residues outside the binding site to restore global sequence–structure compatibility after pocket remodeling; Rosetta FastDesign then redesigns using MPNN-derived profiles, followed by filtering for both binding metrics (e.g., interface H-bonds, shape complementarity, ddG) and stability metrics (e.g., packstat, exposed hydrophobics, global polar satisfaction).
8. Quantitative takeaway (in silico): HBRefine plus ProteinMPNN increases the fraction of designs passing stringent multi-metric filters by up to ~7-fold. When compared to RosettaFold Diffusion all-atom pipelines on progesterone/estriol, CLAIRE yields higher in-silico pass rates; the diffusion designs most often fail on ligand H-bond satisfaction and interface buried unsatisfied polar atoms.
9. Experimental validation on two similar steroids: 26 designs (13 estriol, 13 progesterone) were tested. All expressed solubly; ~58% were monomeric by SEC; 31% were well-folded by 15N-HSQC. Binding by NMR chemical shift perturbations was observed for 1 estriol design and 3 progesterone designs, i.e., 4/26 binders overall (notably higher than typical sub-1% reports for fully generative workflows).
10. Structural and mechanistic support: NMR structures for A1E (apo/holo characterization) and D2P (holo) agree well with models (non-loop Cα RMSDs ~1.5 Å vs AF2). Motif-residue point mutations (e.g., A1E N43V/S45V/Y85F; D2P Y14F/T98V) reduce binding signals, supporting that the designed polar contacts are functionally important. Designed binding modes differ from human estrogen receptor binding solutions and show higher interaction density, indicating novelty rather than copying natural motifs.
💻Code:
github.com/cvgalvin/CLAIRE
📜Paper:
biorxiv.org/content/10.64898…
#ProteinDesign #ComputationalBiology #Rosetta #ProteinMPNN #AlphaFold2 #NMR #DeNovoDesign #SmallMoleculeBinding #HybridModels #StructuralBiology