Machine Learning Meets Pharmacokinetics: A Comparative Analysis of Predictive Models for Plasma Concentration-Time Profiles
1.This study systematically compares five machine learning (ML) approaches for predicting in vivo rat plasma concentration-time (PK) profiles from molecular structure, establishing a new benchmark for computational pharmacokinetics.
2.The standout performer was CMT-PINN, a physics-informed neural network trained directly on concentration-time data, achieving the highest R²-log (0.854), Spearman correlation (0.933), and lowest error metrics among all tested models.
3.PURE-ML, a decision tree model trained on log-transformed concentrations, also performed well, with a lower MAPE-log (22.1%) than CMT-PINN, though with slightly less robust overall profile shape recovery.
4.By contrast, traditional hybrid approaches like NCA-ML and PBPK-ML showed much poorer performance, especially in capturing late-phase PK behavior, indicating limits in models trained on derived parameters.
5.All models were evaluated on the same rigorously curated dataset of 696 compounds and 14,155 plasma concentration points from Sanofi's in vivo studies, ensuring fair comparison under standardized metrics.
6.CMT-PINN and PURE-ML both accurately predicted over 60% of concentration points within a two-fold error margin, while NCA-ML and CMT-ML captured less than 12%, highlighting the gap in modeling strategy efficacy.
7.The authors emphasize that training directly on raw concentration-time data, rather than NCA or compartmental model parameters, leads to markedly improved predictive accuracy—especially valuable for small datasets.
8.While PBPK-ML offers mechanistic interpretability, its prediction quality was highly variable and dependent on in vitro parameter prediction, highlighting the challenge of translating early ADME data into robust PK simulations.
9.Graph convolutional neural networks (GCNNs) were used in NCA-ML and CMT-ML to predict PK parameters from SMILES strings, but their performance suffered from poor parameter identifiability and overfitting risks.
10.The CMT-PINN approach required fewer assumptions, smaller architecture, and no preprocessing of PK parameters, making it both more accurate and computationally efficient than many alternatives.
11.Key PK metrics (AUC, C0, Cmin) were best predicted by CMT-PINN and PURE-ML, showing minimal bias, while NCA-ML and PBPK-ML systematically under- or overestimated these values.
12.This paper highlights the importance of model interpretability, evaluation across the entire PK curve, and standardized comparison frameworks—crucial for transitioning these models into real-world discovery pipelines.
13.Future directions include integrating uncertainty quantification, defining model applicability domains, incorporating non-linear ADME mechanisms, and adopting FAIR data practices to improve transparency and adoption.
14.This comprehensive evaluation sets the stage for an "Applied AI Factory" in drug discovery, where validated ML models can reliably inform early PK predictions, reduce in vivo testing, and accelerate design-make-test cycles.
📜Paper:
biorxiv.org/content/10.1101/… #Pharmacokinetics #DrugDiscovery #MachineLearning #PINN #ADME #PKModeling #ComputationalPharmacology #GraphNN #Bioinformatics #AI4Science