Biology AI Daily

Biology AI Daily

May 28

Multilabel prediction of virus target proteins via multimodal graph representation learning 1 MultiVTP reframes virus target protein (VTP) identification as a multilabel host-protein problem: a single human protein can be targeted by multiple viruses, and prediction can be done using only intrinsic host information (no viral proteins required). 2 The core idea is to learn host susceptibility signals from the human PPI network plus multimodal protein descriptors, then output a vector of virus-specific targeting probabilities per host protein (species-level and family-level labels). 3 Architecture overview: (i) multi-view subgraph sampling around each query protein via repeated random walks, (ii) feature extraction (network topology multimodal), (iii) Graphormer-based integration inside each subgraph, (iv) Progressive Layered Extraction (PLE) to separate shared vs virus-specific binding patterns for multilabel prediction. 4 Network topology is treated at two scales: global roles via node2vec embeddings (256D) and local positions via shortest-path distance encodings used as attention bias in Graphormer; ablations show global topology and the Graphormer module are the largest performance drivers. 5 Multimodal protein features combine (a) traditional curated features (sequence composition, evolutionary conservation metrics like dN/dS and protein age, predicted secondary structure/solvent accessibility, and classic network centralities), (b) sequence embeddings from ESM2, and (c) functional embeddings from GO text encoded by PubMedBERT and aggregated with a GCN over a GO-similarity graph. 6 The PLE multilabel head explicitly models cross-virus commonalities (shared expert) and virus-specific signatures (task experts gating), improving over simpler multilabel strategies (binary relevance / classifier chains / label powerset) and over replacing PLE with a plain MLP. 7 Interpretability: Graphormer self-attention assigns higher attention to VTPs than non-VTPs; proteins with high attention are enriched for host–virus interaction, innate immunity, and antiviral defense processes, suggesting the model prioritizes biologically relevant neighborhoods rather than arbitrary graph proximity. 8 Benchmarking highlights: MultiVTP outperforms host-only HIVPRE for HIV-1 target prediction (reported gains in both AUC and AUPR), beats multiple multilabel baselines (MLP, XGB, RF, SVM with standard multilabel strategies), and remains comparatively robust when training positives are downsampled. 9 Few-shot setting: for viruses with only 20–100 known targets, training-from-scratch already outperforms strong baselines, while fine-tuning a pre-trained MultiVTP model yields large gains (example noted: AAV2 AUPR improvement from scratch to fine-tuned), supporting adaptation to emerging/understudied viruses. 10 Human proteome application: scoring 20,270 UniProt proteins enables systematic nomination of novel VTP candidates per virus and candidates targeted by multiple viruses (MVTPs). Case studies (e.g., H1N1, HIV-1) show predicted candidates tend to connect to known VTPs in the PPI network and enrich known and additional pathways; MVTP candidates show higher conservation and central network positions, suggesting potential as broad-spectrum antiviral targets. 💻Code: github.com/hzau-liulab/Multi… 📜Paper: doi.org/10.1371/journal.pcbi… #ComputationalBiology #Bioinformatics #GraphNeuralNetworks #GraphTransformer #MultiLabelLearning #HostVirusInteractions #Proteomics #SystemsBiology #MachineLearning #DeepLearning

GitHub - hzau-liulab/MultiVTP

Contribute to hzau-liulab/MultiVTP development by creating an account on GitHub.

github.com

1,095