Rethinking the Generalization of Drug Target Affinity Prediction Algorithms via Similarity Aware Evaluation
1. This paper challenges the prevailing evaluation schemes in drug-target affinity (DTA) prediction, showing that conventional random splits produce over-optimistic results by heavily favoring high-similarity samples in the test set.
2. The authors introduce Similarity Aware Evaluation (SAE), a principled framework for creating test splits with user-defined similarity distributions, enabling more realistic and robust assessments of model generalization to novel compounds.
3. In standard random splits, over 88% of test samples are highly similar to the training data. Models like SAM-DTA show sharp performance drops (e.g., R² dropping from 0.65 to -0.64) on low-similarity samples—yet this is hidden in aggregate metrics.
4. SAE formulates test set construction as a constrained optimization problem, using relaxed continuous weights and differentiable approximations of similarity and bin-counting functions, allowing flexible and efficient test set splitting.
5. The authors propose both "balanced splits"—uniform across similarity bins—and "mimic splits" that replicate external test distributions. Both strategies significantly outperform random, scaffold, and SIMPD splits in reflecting real-world generalization.
6. Evaluations across four datasets (EGFR, BACE1, Carbonic anhydrase I/II) and five models (SAM-DTA, PharmHGT, FusionDTA, MolCLR, ChemBERTa) confirm that SAE provides clearer insight into performance degradation across similarity levels.
7. SAE-based splits enable more accurate hyperparameter tuning. Models optimized using mimic splits show improved performance on external datasets compared to those tuned using conventional splits, underscoring its utility in practical deployment.
8. SAE supports additional use cases like generating test sets with predefined maximum similarity (e.g., <0.4 or <0.6), useful for avoiding IP conflicts or targeting novelty in drug screening.
9. The authors open-source their full implementation, providing a PyTorch-based optimizer for test split generation, and demonstrate how SAE can be extended to QSAR, ADMET, protein-protein interaction, and drug-drug interaction prediction tasks.
10. Overall, SAE reframes how generalization in DTA prediction should be evaluated, advocating for similarity-aware testing to ensure models perform reliably when exposed to unfamiliar chemical space.
💻Code:
github.com/Amshoreline/SAE
📜Paper:
arxiv.org/abs/2504.09481
#drugdiscovery #DTAprediction #machinelearning #QSAR #bioinformatics #generalisability #AIinbiotech #deeplearning #datasetbias #modelvalidation