StoPred: Accurate Stoichiometry Prediction for Protein Complexes Using Protein Language Models and Graph Attention
1. A new method called StoPred has been introduced for predicting the stoichiometry of protein complexes. This method integrates protein language models (pLM) with graph attention networks (GAT) to model subunit-level interactions, offering a novel approach to predict both homo- and hetero-oligomer stoichiometry directly from sequence or structure features without requiring template assemblies or predefined composition.
2. StoPred outperforms existing methods in terms of accuracy and efficiency. It achieves up to 16% higher top-1 accuracy for homomeric complexes and 41% higher for heteromeric complexes compared to the strongest prior method on the held-out test dataset. This demonstrates its significant improvement over traditional template-based and deep learning-based approaches.
3. The method leverages protein language models to capture structural and functional features embedded within amino acid sequences. By using a graph attention network, StoPred can model dependencies between different subunits in a protein complex, which is crucial for predicting the stoichiometry of hetero-oligomeric complexes.
4. StoPred was benchmarked against various methods, including deep learning-based and template-based approaches, on curated and blind datasets. It consistently showed superior performance, especially in predicting the stoichiometry of heteromeric complexes, which are more biologically complex and important but often challenging to predict.
5. The study also includes case studies that highlight the advantages of StoPred over AlphaFold3 score-based selection. For example, StoPred correctly identifies the stoichiometry of certain protein complexes, whereas AlphaFold3 may assign higher ranking scores to incorrect models due to its focus on structural accuracy rather than stoichiometry correctness.
6. StoPred is designed to be computationally efficient, making it a practical tool for predicting the stoichiometry of protein complexes with unknown or uncertain composition. It can guide the setup of high-resolution modeling and support downstream structural and functional analysis.
7. Future work includes improving structure-based embeddings to capture more features and developing multi-modal models that integrate template information, sequence, and structural features in a unified framework. The authors also plan to extend the method to protein–nucleotide complexes.
📜Paper:
biorxiv.org/content/10.1101/…
#ProteinComplexes #StoichiometryPrediction #ProteinLanguageModels #GraphAttentionNetworks #ComputationalBiology