PoseX: AI Defeats Physics Approaches on Protein-Ligand Cross Docking
1. PoseX introduces a new large-scale benchmark for evaluating protein-ligand docking, focusing on the more practical and challenging cross-docking scenario—where AI methods now consistently outperform traditional physics-based tools.
2. The PoseX dataset includes 2,030 high-quality docking tasks: 718 for self-docking and 1,312 for cross-docking, curated from recent PDB entries to avoid overlap with model training data, enabling fair and future-facing evaluations.
3. The benchmark evaluates 22 methods across three categories: traditional physics-based docking (e.g., Glide, AutoDock Vina), AI docking (e.g., DiffDock, Uni-Mol, SurfDock), and AI co-folding (e.g., AlphaFold3, Protenix, Boltz-1).
4. AI docking models now lead in accuracy. SurfDock and Uni-Mol achieve up to 94% success on self-docking and 75% on cross-docking, significantly surpassing traditional tools like Glide, which plateaus around 30–60%.
5. The authors introduce a robust energy minimization pipeline ("relaxation") that dramatically improves pose quality by correcting stereochemistry and reducing strain, boosting AI model accuracy by several percentage points.
6. AI models conditioned on binding pocket information (e.g., DiffDock-Pocket) outperform pocket-agnostic models, underscoring the importance of structure-aware context in realistic docking scenarios.
7. Co-folding models like AlphaFold3 and Boltz-1 show strong performance in self-docking but struggle in cross-docking due to limited flexibility modeling, highlighting the need for pose-specific refinement in generalist architectures.
8. PoseX introduces PB-valid metrics from PoseBusters to assess physicochemical validity, not just RMSD, ensuring predicted poses are chemically plausible—a key step toward usable docking outputs.
9. Detailed analysis shows that docking accuracy correlates with binding pocket similarity to training data. AI models like Chai-1 and Protenix are highly sensitive to pocket novelty, while traditional methods remain largely unaffected.
10. The benchmark supports a real-time leaderboard, a modular codebase, and rigorous evaluation protocols, offering a transparent and comprehensive platform for testing new docking algorithms under realistic conditions.
11. PoseX sets a new standard for benchmarking docking methods by shifting focus from simplified self-docking to challenging cross-docking tasks, driving progress toward generalizable, deployable AI models in drug discovery.
💻Code:
github.com/CataAI/PoseX
📜Paper:
arxiv.org/abs/2505.01700
#DrugDiscovery #MolecularDocking #CrossDocking #AI4Science #StructuralBiology #ProteinLigand #DeepLearning #PoseX #AlphaFold3 #DiffDock #SurfDock