Hierarchical affinity landscape navigation through learning a shared pocket-ligand space
1.LigUnity is a foundation model for protein-ligand affinity prediction that unifies virtual screening and hit-to-lead optimization in a single shared embedding space, capturing both broad scaffold-level and fine-grained pharmacophore-level ligand interactions.
2.Unlike existing models that treat virtual screening and lead optimization separately, LigUnity jointly embeds ligands and protein pockets using scaffold discrimination and pharmacophore ranking to navigate a hierarchical affinity landscape.
3.In virtual screening, LigUnity outperforms 24 state-of-the-art methods across DUD-E, DEKOIS, and LIT-PCBA benchmarks, with over 50% improvement in EF1% and 10⁶× speedup compared to docking methods like Glide-SP, without requiring binding poses.
4.The model maintains high performance even on novel targets with low sequence similarity (<30%) to training proteins, demonstrating robust generalization capabilities that surpass both structure-based and structure-free baselines.
5.For hit-to-lead optimization, LigUnity outperforms physics-based methods such as FEP and structure-based models like GenScore on Merck and JACS FEP benchmarks, showing strong predictive power in zero-shot and few-shot scenarios.
6.Even under challenging conditions where both ligands and proteins are dissimilar to training data, LigUnity improves r² by 38.1% over its sequence-only variant, confirming the value of incorporating explicit pocket structure.
7.Fine-tuning LigUnity with only partial binding data (as few as 4–16 ligands) yields competitive or superior accuracy to commercial tools like FEP (OPLS4), offering an efficient alternative for large-scale lead optimization.
8.To support the model, the authors curated PocketAffDB, the largest structure-aware binding assay dataset, with 0.8M affinity datapoints, 0.5M unique ligands, and 53,406 binding pockets—enabling structure-aware learning across diverse assays.
9.LigUnity includes a heterogeneous GNN that leverages a large pocket-ligand knowledge graph (16M pocket-pocket edges and 0.83M pocket-ligand edges) to refine query embeddings, improving screening performance by sharing information across similar pockets.
10.When integrated into an active learning framework for TYK2 optimization, LigUnity successfully identifies high-affinity ligands within four iterations, achieving 40% r² improvement and discovering nanomolar hits with dramatically fewer FEP calculations.
11.The model is interpretable: through residue and atom-level masking, LigUnity highlights pharmacophoric groups and pocket residues crucial for binding, aligning well with known crystallographic interactions.
12.Across split-by-time, split-by-scaffold, and split-by-unit settings in ChEMBL and BindingDB, LigUnity consistently outperforms other models, particularly excelling in underexplored settings like percentage-based assay formats.
13.LigUnity eliminates the need for 3D docking or pose generation, making it a practical and fast solution for real-world drug discovery pipelines that involve millions of ligands and diverse protein targets.
14.The study presents LigUnity as a general-purpose, structure-aware foundation model for computer-aided drug discovery, bridging the gap between early virtual screening and downstream optimization with a single efficient architecture.
💻Code:
github.com/IDEA-XL/LigUnity
📜Paper:
biorxiv.org/content/10.1101/…
#DrugDiscovery #MachineLearning #VirtualScreening #DeepLearning #Bioinformatics #StructureBasedDesign #LigUnity #ComputationalBiology #AffinityPrediction