ProteomeScan: A Toolkit For Target Validation By Proteome-Wide Docking And Analysis
1. ProteomeScan is a gene-driven, proteome-wide “one compound–many targets” docking toolkit: it selects an optimal experimental PDB structure set per human gene, runs large-scale blind docking, and ranks candidate targets by predicted binding affinity.
2. Scale and coverage: starting from 82,861 UniProt human protein entries, the authors curate to 7,657 unique genes with experimental PDB structures (excluding NMR models and very large supercomplexes), then dock 20 drug-like ligands across the proteome (>300,000 docking tasks; ~93.65% completion).
3. Key practical innovation: a systematic “protein promiscuity” analysis to identify proteins that repeatedly appear as top hits across many ligands (broad binders that can dominate rankings and inflate false positives). Promiscuity is defined by being in the top m% for at least n of N ligands, enabling tunable filtering.
4. Performance is evaluated with a proteome-coverage metric, Known Target Recovery (KTR): the fraction of known drug–target pairs recovered within the top m% of ranked predictions. ProteomeScan significantly outperforms random rankings (Mann–Whitney U p-values ~1e-4 to 1e-5), and moderate promiscuity filtering improves KTR (e.g., at top 5%: 13.64% baseline ProteomeScan vs 20.45% after removing 166 promiscuous targets).
5. Pose validation is treated as a first-class step, not an afterthought: docking hits are checked for whether ligands actually occupy druggable pockets using fpocket plus custom overlap metrics (e.g., % Ligand Inside Pocket; % Pocket Filled), typically focusing on the top 5–10 pockets by fpocket druggability.
6. Mutant-aware analysis: the study tests whether docking ranks clinically relevant mutant variants above wild type when expected. It succeeds for examples like dabrafenib preferring BRAF V600E over wild-type BRAF, and alpelisib ranking PIK3CA mutants (H1047R, E545K) above wild type—while also highlighting failures where docking scores alone can be misleading without pocket-occupancy checks.
7. The paper explicitly maps where proteome-wide blind docking breaks down: assembly-dependent binding (paclitaxel requiring microtubule assembly; RMC-6236 requiring a tri-complex “molecular glue” mechanism) and cases where favorable scores come from non-druggable surface sites (revealed by low pocket occupancy).
8. Biological interpretation of promiscuous targets: the top promiscuous set (e.g., 166 proteins at top 25% across all 20 ligands) is enriched for families known for broad ligand binding (CYP3A4, GSTs, AKRs, ADH/ALDH enzymes, HSP90), plus potential “over-predicted” large/flexible pockets (e.g., some HDAC/ion channel/GPCR cases) flagged as needing experimental follow-up.
9. A secondary check on promiscuity realism: methylation sensitivity experiments on kinase-like promiscuous targets show methylated ligands often reduce predicted affinity, consistent with pocket-specific interactions; cases with minimal change suggest weaker/non-specific docking interactions.
10. Implementation and accessibility: the workflow is exposed via the DeepChem Server (FastAPI asynchronous jobs), scales via a cloud backend (Prithvi/AWS Batch in the commercial deployment), and the authors emphasize reproducibility by open-sourcing the core algorithm within the DeepChem ecosystem.
💻Code:
github.com/deepforestsci/Pro…
📜Paper:
biorxiv.org/content/10.64898…
#ComputationalBiology #DrugDiscovery #MolecularDocking #TargetIdentification #ChemicalBiology #Bioinformatics #DeepChem #Proteomics #Polypharmacology #Toxicology