Biology AI Daily

Biology AI Daily

Apr 29

PreStoi allows accurate prediction of protein complex stoichiometry by integrating AlphaFold3 and template information 1. The paper addresses a practical bottleneck in multimer structure prediction: AlphaFold-style complex modeling typically assumes stoichiometry is known, but many real complexes lack reliable copy-number annotations. PreStoi targets the upstream task: predicting how many copies of each subunit are present. 2. Core idea: treat stoichiometry as a discrete search-and-rank problem. PreStoi (i) proposes plausible stoichiometry candidates, (ii) builds AlphaFold3 models for each candidate, (iii) ranks candidates using AlphaFold3 ranking scores, and (iv) refines decisions using homologous complex-template evidence when available. 3. CASP16 Phase 0 (blind stoichiometry) results: using the MULTICOM_AI predictor with PreStoi, the correct stoichiometry was ranked top-1 for 71.4% of targets (20/28) and appeared in the top-3 for 92.9% (26/28). Compared with other CASP16 approaches, performance was strongest when considering both top-1 accuracy and “any-submitted” accuracy. 4. The integration matters: AlphaFold3-only ranking was substantially weaker than the combined system. On CASP16 targets, AF-max (max AF3 ranking score) achieved 50% top-1, AF-avg (average score) 42.9%, and template-based prediction alone 50%; the integrated decision rule reached 71.4% top-1. 5. Why templates help: when significant complex templates exist, template-based stoichiometry is often more precise for top-1 selection (82.4% accuracy on applicable CASP16 targets), and it can override AF3 ranking mistakes. Templates also cover cases where AF3 cannot be run (e.g., very large complexes exceeding the AlphaFold3 web-server 5000-residue limit). 6. Why AlphaFold3 helps: templates are missing for a substantial fraction of targets (about 39% in CASP16). In those cases, AF3 ranking provides a workable signal to narrow down candidates, and AF-max/AF-avg can still succeed without a full complex template spanning all subunits. 7. A key failure mode is “compatible stoichiometries” that yield highly similar structures, especially hetero-dimer (A1B1) vs hetero-tetramer (A2B2). The models can be partially/fully superimposable, making AF3 ranking scores (and sometimes template heuristics) struggle to confidently separate them. 8. Additional hard cases highlight missing priors beyond AF3 ranking scores: (i) hierarchical oligomerization (subunits form homo-oligomers first, then assemble), (ii) symmetry considerations (symmetric stoichiometries may be more plausible than asymmetric high-scoring ones), and (iii) filaments with large or variable copy numbers (e.g., A9B18), where candidate enumeration becomes intractable without strong external constraints. 9. Large-scale benchmark (P16-BMS; 2,014 post-CASP16 “blind” PDB complexes): automated template evidence alone reached 46.2% top-1 (64.9% top-3). For hetero-multimers, using AlphaFold3 to re-rank the top-10 template-derived candidates improved top-1 into ~46–49% range, illustrating complementarity between template evidence (candidate generation/prior) and AF3 scores (re-ranking signal). 10. Method details of the automated pipeline: templates are found per subunit via MULTICOM4 (Jackhmmer/MSA, HHsearch). Candidate copy numbers are supported and weighted by −log10(E-value). A multi-subunit template promotion step boosts candidates consistent with partial/full complex templates spanning multiple subunits, while filtering out candidates exceeding the 5000-residue AF3 feasibility limit. 💻Code: github.com/jianlin-cheng/pre… 📜Paper: doi.org/10.1038/s42003-026-1… #ProteinComplexes #Stoichiometry #AlphaFold3 #CASP16 #StructuralBioinformatics #ComputationalBiology #ProteinStructure #MultimerPrediction #Templates #Benchmarking

2,292