Graph neural networks that read bacterial genomes to predict antibiotic resistance
Antimicrobial resistance kills over a million people every year. When a patient arrives with a severe bacterial infection, clinicians need to know which antibiotics will work—fast. Culture-based susceptibility testing takes 2 to 5 days. Whole-genome sequencing offers a shortcut, but translating raw bacterial DNA into reliable resistance predictions is far from trivial.
Bacterial genomes can be represented in many ways—SNPs, reference-free unitigs, image-like frequency chaos game representations (FCGR)—and there is no consensus on which works best. Worse, bacteria reproduce clonally, so standard ML models often learn to recognise high-risk lineages rather than the actual resistance mechanisms.
Nguyen and coauthors tackle both problems with AMR-GNN, a graph neural network that integrates multiple genomic representations simultaneously. Unitig features serve as node features; SNP- and FCGR-derived pairwise distances define the graph edges. Two parallel GCN modules learn from the same nodes but different connectivity structures, and their embeddings are fused before a final resistance/susceptibility classification.
Tested on 2,515 Pseudomonas aeruginosa isolates across 12 antibiotics, AMR-GNN significantly outperforms single-representation models in 11/12 drugs—with AUROC gains of 28.8% for cefepime and 18.9% for aztreonam, precisely where prediction is hardest. A structural fix for clonal confounding—removing edges between isolates of the same sequence type, forcing the model to learn from genetically distinct neighbours—improves performance further across all tested antibiotics.
Validated on 23,000 genomes spanning E. coli, K. pneumoniae, S. aureus, and E. faecium, mean AUROCs exceed 0.90 in nearly every species-drug combination. The model also recovers known resistance genes (gyrA, gyrB, parC for levofloxacin; fusA1 for tobramycin) through integrated gradient analysis—without any prior AMR knowledge encoded in the architecture.
Multi-representation learning, graph-based relational structure, and built-in interpretability. Three historically separate challenges, addressed in a single unified framework.
Paper:
nature.com/articles/s41467-0…