Filter
Exclude
Time range
-
Near
Benchmarking MSA pairing for protein-protein complex structure prediction reveals a depth-over-pairing principle 1. The study benchmarks whether “paired MSAs” are truly necessary for accurate protein complex prediction in AlphaFold3, and concludes a practical rule: MSA depth matters more than enforcing correct inter-chain pairing. 2. On a stringent benchmark of 439 non-redundant heterodimers (HD439; filtered to avoid similarity to AF3 training interfaces), adding species-paired MSAs (pMSA) only slightly improves mean DockQ vs unpaired concatenated monomer MSAs (mMSA): 0.613 vs 0.602. 3. The key control is “shuffle pairing” (sMSA): the paired sequences are randomly permuted to break inter-chain coevolution while preserving depth and composition. sMSA matches pMSA almost exactly (mean DockQ 0.612; P=0.96), implying the small gain from pMSA is largely due to extra sequences (depth), not correct pairing constraints. 4. Inter-species complexes are substantially harder than intra-species ones (pMSA mean DockQ 0.545 vs 0.632), consistent with weaker or absent coevolution and much shallower paired MSAs. Notably, for inter-species targets, sMSA can outperform pMSA (0.561 vs 0.545), suggesting species-based pairing can inject incorrect constraints/noise. 5. A depth-maximizing strategy, uMSA, merges the monomer MSAs with the raw UniProt hits used for pairing but without pairing. uMSA achieves the best overall mean DockQ on HD439 (0.623), outperforming pMSA/sMSA, and helps both intra- and inter-species subsets. 6. Case studies show why depth helps: when one chain has a shallow MSA, monomer folding degrades and complex docking collapses; adding many homologs (even unpaired) restores monomer quality and enables accurate docking. uMSA can also improve “MSA quality” under fixed input limits by enriching medium-identity signal and reducing low-identity noise. 7. Mechanistic interpretation: AF3 can succeed without explicit inter-chain coevolution because (i) interfaces can be determined by physicochemical/shape complementarity once monomers are well constrained by deep MSAs, and (ii) AF3’s architecture (MSA module deep Pairformer triangle updates) can iteratively recover latent interaction patterns from mixed signals, reducing dependence on explicit pairing. 8. Alternative pairing methods (annotation-based Distance/STRING/PDB and PLM-based pairing variants) provide only minor differences; default species pairing is slightly best among pairing methods, but the core result persists: uMSA (more homologs, no pairing) is consistently competitive or better. 9. External validation with omicMSA (very deep MSAs from unassembled reads/draft genomes) supports the principle: big gains come from deeper sequence resources. Shuffling paired sequences remains comparable to pairing for most targets, and a fully unpaired concatenation of omicMSA subunit MSAs (oMSA) slightly outperforms paired omicMSA on average. 10. The paper also maps failure modes: accuracy drops with very large complexes (linked to AF3 crop/token limits), with small relative interface areas (transient/IDR-rich interactions), and for lower-quality experimental references (notably low-resolution cryo-EM and NMR ensembles). It also shows pairing is more important for AFM and RF2 than AF3, tying pairing sensitivity to architecture depth/updates; for AFM, pooling models from multiple MSA modes can improve results. 📜Paper: biorxiv.org/content/10.64898… #AlphaFold3 #ProteinComplex #MSA #Bioinformatics #ComputationalBiology #StructuralBiology #ProteinProteinInteractions #DeepLearning
1
5
34
2,222
ContrastQA: A Label-guided Graph Contrastive Learning-based approach for protein complex structure quality assessment 1.ContrastQA introduces a novel framework for protein complex model quality assessment by integrating label-guided graph contrastive learning with geometric graph neural networks. It is the first method to guide sample pairing using interface quality scores (DockQ), significantly improving protein structure embedding learning. 2.Unlike traditional contrastive learning that relies on random data augmentations, ContrastQA defines positive and negative pairs based on domain knowledge—specifically, the DockQ score—which reflects interface quality. This label-guided strategy allows more precise learning of local and global structural features. 3.The model architecture combines GVP-GNN for geometric reasoning with a customized contrastive loss that weighs negative pairs by DockQ score differences. This ensures better differentiation between high-quality and low-quality decoys from the same target. 4.On the CASP16 dataset, ContrastQA achieved state-of-the-art performance among 12 advanced methods, with the lowest TMscore Top1 loss (0.123) and GDT-TS Top1 loss (0.116), representing improvements of 10.9% and 8.7%, respectively, over the second-best approaches. 5.ContrastQA is particularly strong on difficult targets. For CASP16 hard targets, it reduced TMscore ranking loss by 14.9% compared to the next best method, showcasing its potential for evaluating challenging and template-free protein complexes. 6.The method also excels on high-precision datasets. On the ABAG-AF3 dataset, derived from AlphaFold3 predictions, it achieved a 0.028 TMscore ranking loss—9.7% lower than VoroIF-GNN—and 0.021 GDT-TS loss—22.2% lower than the runner-up, demonstrating broad applicability across datasets. 7.Ablation studies show the graph contrastive learning module is critical: removing it worsened CASP16 ranking loss by up to 37.9%. The weighted contrast loss design also had a substantial impact, reinforcing the importance of their novel contrastive learning strategy. 8.ContrastQA requires far less training data than other top competitors. It was trained on ~20,000 decoys, compared to millions used by CASP16 teams like GuijunLab-QA, yet still achieved superior performance—suggesting strong generalizability and sample efficiency. 9.Future extensions will include integrating physicochemical interface features into sample pairing and expanding from global to local (interface or residue-level) accuracy estimation, moving toward multi-task EMA frameworks for protein complexes. 💻Code: github.com/Cao-Labs/Contrast… 📜Paper: biorxiv.org/content/10.1101/… #ProteinComplex #StructuralBiology #GraphNeuralNetwork #ContrastiveLearning #AlphaFold #DeepLearning #Bioinformatics #ProteinQA
1
1
15
1,252
Comparative assessment of co-folding methods for molecular glue ternary structure prediction 1.This study provides the first large-scale benchmark of AI-based co-folding methods for modeling molecular glue (MG)-mediated ternary complexes, a foundational step toward rational MG design. 2.The authors curated MG-PDB, a high-quality dataset of 221 non-covalent MG ternary complexes, and introduced MGBench, a challenging benchmark set of 88 structures unseen during model training. 3.They evaluated five state-of-the-art co-folding models: AlphaFold 3 (AF3), Chai-1, Boltz-1, Protenix, and RoseTTAFold All-Atom (RFAA), using multiple structural quality metrics (DockQ, LDDT-PLI, Ligand RMSD). 4.AlphaFold 3 demonstrated the best overall performance, with a 50.6% success rate for protein-protein interfaces and 32.9% for MG-protein interaction recovery in MGBench, outperforming other methods by a significant margin. 5.However, performance dropped sharply for novel cases (low homology), revealing a strong memorization effect. Models performed well mainly on structures similar to their training data. 6.Large PPI interfaces, domain-domain interaction types, and MGDs (degraders) posed particular challenges to all methods, indicating a need for better modeling of complex binding scenarios and cooperativity. 7.Domain-motif interactions and known CRBN-G-loop degrons were modeled with higher accuracy, suggesting current methods exploit conserved structural motifs when available. 8.Detailed case studies showed accurate modeling of CRBN-dWIZ-1-WIZ(ZF7) complex but failure on novel E3 ligase systems like KBTBD4:UM171:HDAC1, reinforcing the need for broader training data and improved generalization. 9.This benchmark confirms co-folding methods have yet to learn atomic-level interaction rules required for truly de novo MG complex prediction, especially for systems lacking co-evolutionary signals. 10.MG-PDB and MGBench offer critical resources to drive forward structure-based MG discovery and guide the development of next-generation co-folding and scoring models. 💻Code: github.com/yiyanliao/MGBench 📜Paper: biorxiv.org/content/10.1101/… #MolecularGlue #AI4Science #AlphaFold #ProteinDesign #DrugDiscovery #ProteinComplex #TargetedDegradation #ComputationalBiology
1
9
49
4,988
Comparative assessment of co-folding methods for molecular glue ternary structure prediction 1.This study provides the first large-scale benchmark of AI-based co-folding methods for modeling molecular glue (MG)-mediated ternary complexes, a foundational step toward rational MG design. 2.The authors curated MG-PDB, a high-quality dataset of 221 non-covalent MG ternary complexes, and introduced MGBench, a challenging benchmark set of 88 structures unseen during model training. 3.They evaluated five state-of-the-art co-folding models: AlphaFold 3 (AF3), Chai-1, Boltz-1, Protenix, and RoseTTAFold All-Atom (RFAA), using multiple structural quality metrics (DockQ, LDDT-PLI, Ligand RMSD). 4.AlphaFold 3 demonstrated the best overall performance, with a 50.6% success rate for protein-protein interfaces and 32.9% for MG-protein interaction recovery in MGBench, outperforming other methods by a significant margin. 5.However, performance dropped sharply for novel cases (low homology), revealing a strong memorization effect. Models performed well mainly on structures similar to their training data. 6.Large PPI interfaces, domain-domain interaction types, and MGDs (degraders) posed particular challenges to all methods, indicating a need for better modeling of complex binding scenarios and cooperativity. 7.Domain-motif interactions and known CRBN-G-loop degrons were modeled with higher accuracy, suggesting current methods exploit conserved structural motifs when available. 8.Detailed case studies showed accurate modeling of CRBN-dWIZ-1-WIZ(ZF7) complex but failure on novel E3 ligase systems like KBTBD4:UM171:HDAC1, reinforcing the need for broader training data and improved generalization. 9.This benchmark confirms co-folding methods have yet to learn atomic-level interaction rules required for truly de novo MG complex prediction, especially for systems lacking co-evolutionary signals. 10.MG-PDB and MGBench offer critical resources to drive forward structure-based MG discovery and guide the development of next-generation co-folding and scoring models. 💻Code: github.com/yiyanliao/MGBench 📜Paper: biorxiv.org/content/10.1101/… #MolecularGlue #AI4Science #AlphaFold #ProteinDesign #DrugDiscovery #ProteinComplex #TargetedDegradation #ComputationalBiology
1
4
555
An All-Atom Generative Model for Designing Protein Complexes 1. APM (All-Atom Protein Generative Model) is a novel generative framework specifically designed to model, fold, and generate multi-chain protein complexes at all-atom resolution—an area long underserved by traditional single-chain models. 2. Unlike methods that rely on pseudo-sequence linking for multi-chain modeling, APM handles native multi-chain structures through architecture and data-level innovations, allowing precise modeling of inter-chain interactions. 3. APM integrates a three-module pipeline: (1) Seq&BB module for co-generating backbone and sequence via flow matching, (2) Sidechain module to generate full-atom sidechain conformations, and (3) Refine module to optimize structures with all-atom awareness. 4. To maintain sequence-structure coherence during generation, APM employs a novel decoupled noising and two-phase training strategy, enabling high-fidelity reconstruction across both modalities. 5. Benchmarks on single-chain tasks show APM performs competitively with leading models like ESM3 and ESMFold, and outperforms MultiFlow and ProteinGenerator on inverse folding and structure generation across various protein lengths. 6. APM is one of the first generative models to demonstrate reliable folding and inverse folding on multi-chain proteins without MSA, outperforming Boltz-1 (noMSA) and achieving high amino acid recovery and scTM scores. 7. In de novo complex generation, APM achieves significantly stronger binding energies and lower RMSD compared to Chroma, validating its ability to design well-packed interfaces using all-atom features. 8. APM’s chain-by-chain conditional generation offers controllable complex formation, supporting flexible design strategies where chains fold independently and bind cooperatively. 9. On downstream applications, APM achieves state-of-the-art performance in antibody CDR-H3 co-design (RAbD benchmark) and targeted peptide design (LNR dataset), surpassing specialized models like dyMEAN, DiffAb, and PepGLAD in binding affinity and structure quality. 10. By explicitly modeling all-atom details, natively handling multi-chain systems, and supporting both zero-shot and fine-tuned design tasks, APM paves the way for next-generation protein complex design with broad applications in therapeutic development. 💻Code: github.com/bytedance/apm 📜Paper: arxiv.org/abs/2504.13075 #proteincomplex #proteindesign #bioinformatics #proteinengineering #deepgenerativemodels #multichainproteins #antibodysdesign #peptidedesign #AI4Science #APMmodel #structuregeneration
1
13
74
10,258
An All-Atom Generative Model for Designing Protein Complexes 1. APM (All-Atom Protein Generative Model) is a novel generative framework specifically designed to model, fold, and generate multi-chain protein complexes at all-atom resolution—an area long underserved by traditional single-chain models. 2. Unlike methods that rely on pseudo-sequence linking for multi-chain modeling, APM handles native multi-chain structures through architecture and data-level innovations, allowing precise modeling of inter-chain interactions. 3. APM integrates a three-module pipeline: (1) Seq&BB module for co-generating backbone and sequence via flow matching, (2) Sidechain module to generate full-atom sidechain conformations, and (3) Refine module to optimize structures with all-atom awareness. 4. To maintain sequence-structure coherence during generation, APM employs a novel decoupled noising and two-phase training strategy, enabling high-fidelity reconstruction across both modalities. 5. Benchmarks on single-chain tasks show APM performs competitively with leading models like ESM3 and ESMFold, and outperforms MultiFlow and ProteinGenerator on inverse folding and structure generation across various protein lengths. 6. APM is one of the first generative models to demonstrate reliable folding and inverse folding on multi-chain proteins without MSA, outperforming Boltz-1 (noMSA) and achieving high amino acid recovery and scTM scores. 7. In de novo complex generation, APM achieves significantly stronger binding energies and lower RMSD compared to Chroma, validating its ability to design well-packed interfaces using all-atom features. 8. APM’s chain-by-chain conditional generation offers controllable complex formation, supporting flexible design strategies where chains fold independently and bind cooperatively. 9. On downstream applications, APM achieves state-of-the-art performance in antibody CDR-H3 co-design (RAbD benchmark) and targeted peptide design (LNR dataset), surpassing specialized models like dyMEAN, DiffAb, and PepGLAD in binding affinity and structure quality. 10. By explicitly modeling all-atom details, natively handling multi-chain systems, and supporting both zero-shot and fine-tuned design tasks, APM paves the way for next-generation protein complex design with broad applications in therapeutic development. 💻Code: github.com/bytedance/apm 📜Paper: arxiv.org/abs/2504.13075 #proteincomplex #proteindesign #bioinformatics #proteinengineering #deepgenerativemodels #multichainproteins #antibodysdesign #peptidedesign #AI4Science #APMmodel #structuregeneration
1
23
76
6,936
Assessing scoring metrics for AlphaFold2 and AlphaFold3 protein complex predictions 1. This study benchmarks commonly used scoring metrics to assess the quality of protein complex models predicted by AlphaFold2 (via ColabFold) and AlphaFold3, revealing key differences in their ability to evaluate heterodimeric interfaces accurately. 2. Using 325 high-resolution heterodimeric structures, the authors found that ColabFold with templates and AlphaFold3 produced similarly high proportions of correct models, both outperforming ColabFold without templates, especially in generating high DockQ-scoring models. 3. Interface-specific scores—particularly ipTM and model confidence—emerged as the most reliable metrics for evaluating protein complex models, consistently showing the highest correlation with DockQ across all datasets. 4. Global scores like pLDDT and PAE were less effective than interface-focused metrics such as ipLDDT and iPAE. Notably, pDockQ2 had poor correlation with actual model quality, often misclassifying high-quality models. 5. To address these inconsistencies, the authors developed C2Qscore, a combined score using linear regression over multiple interface-specific metrics. C2Qscore outperformed individual metrics and unweighted averages, especially in X-ray-validated datasets. 6. The study derived method-specific cutoffs for key metrics like ipTM to better classify models as correct or incorrect, including within the “gray zone” (ipTM = 0.6–0.8) defined by AlphaFold. These thresholds improved assessment precision and reduced false positives. 7. While AF3 produced more consistent results across replicates, CF-T (ColabFold with templates) generated a slightly higher number of top-quality predictions. However, AF3 models were more reliable when evaluated across a broader quality range. 8. In complex cryoEM-derived assemblies, the study highlights limitations of all evaluated methods: when multiple dimeric configurations exist, DockQ often misclassifies good models due to reference mismatches, but C2Qscore remained relatively robust. 9. The authors integrated C2Qscore into the ChimeraX plug-in PICKLUSTER v2.0, enabling researchers to visualize and assess predicted interfaces with detailed, customizable metrics—including support for AlphaFold3 models. 10. This work provides a practical guide for interpreting and comparing protein complex predictions across AF2/ColabFold and AF3, offering scoring thresholds, benchmarking insights, and tools for improving confidence in computational structural biology. 💻Code: gitlab.com/topf-lab/picklust… 📜Paper: biorxiv.org/content/10.1101/… #AlphaFold2 #AlphaFold3 #proteincomplex #structureprediction #bioinformatics #modelassessment #interfaceprediction #ColabFold #scoringmetrics #AI4Science
15
48
4,599
Rapid and sensitive protein complex alignment with Foldseek-Multimer @naturemethods 1. Foldseek-Multimer introduces a groundbreaking method for aligning protein complexes, achieving 3-4 orders of magnitude faster performance than traditional methods, while maintaining accuracy comparable to the gold standard, US-align. 2. The key innovation lies in its efficient chain-to-chain comparison using Foldseek, followed by superposition vector clustering to align entire complexes. This approach allows for extremely rapid complex-to-complex comparisons across massive databases. 3. With the ability to align billions of protein complex pairs in just 11 hours, Foldseek-Multimer offers unprecedented speed, enabling large-scale analysis of protein complexes in the AlphaFold era, especially in metagenomic studies. 4. The method has been shown to be highly sensitive, identifying structural similarities even between distant homologs, which traditional methods might miss. This is particularly useful for comparing complexes with low sequence similarity. 5. In benchmarks, Foldseek-Multimer identified millions of new similar homomeric pairs that other methods missed, demonstrating its superior capability in detecting protein complex similarities at scale. 6. Its speed and sensitivity make Foldseek-Multimer an essential tool for protein complex analysis, from basic research to applications in drug discovery and structural biology, allowing faster insights into complex molecular functions. 7. Available as open-source software, Foldseek-Multimer is accessible for researchers globally, with support for both command-line and web server usage, making it easy to integrate into existing workflows. 💻Code: github.com/steineggerlab/fol… 📜Paper: nature.com/articles/s41592-0… #ProteinComplex #StructuralBioinformatics #ProteinAlignment #Bioinformatics #MachineLearning #DrugDiscovery #Metagenomics #AlphaFold #ComputationalBiology
12
50
5,373
Rapid and sensitive protein complex alignment with Foldseek-Multimer @naturemethods 1. Foldseek-Multimer introduces a groundbreaking method for aligning protein complexes, achieving 3-4 orders of magnitude faster performance than traditional methods, while maintaining accuracy comparable to the gold standard, US-align. 2. The key innovation lies in its efficient chain-to-chain comparison using Foldseek, followed by superposition vector clustering to align entire complexes. This approach allows for extremely rapid complex-to-complex comparisons across massive databases. 3. With the ability to align billions of protein complex pairs in just 11 hours, Foldseek-Multimer offers unprecedented speed, enabling large-scale analysis of protein complexes in the AlphaFold era, especially in metagenomic studies. 4. The method has been shown to be highly sensitive, identifying structural similarities even between distant homologs, which traditional methods might miss. This is particularly useful for comparing complexes with low sequence similarity. 5. In benchmarks, Foldseek-Multimer identified millions of new similar homomeric pairs that other methods missed, demonstrating its superior capability in detecting protein complex similarities at scale. 6. Its speed and sensitivity make Foldseek-Multimer an essential tool for protein complex analysis, from basic research to applications in drug discovery and structural biology, allowing faster insights into complex molecular functions. 7. Available as open-source software, Foldseek-Multimer is accessible for researchers globally, with support for both command-line and web server usage, making it easy to integrate into existing workflows. @thesteinegger @clmgilchrist 💻Code: github.com/steineggerlab/fol… 📜Paper: nature.com/articles/s41592-0… #ProteinComplex #StructuralBioinformatics #ProteinAlignment #Bioinformatics #MachineLearning #DrugDiscovery #Metagenomics #AlphaFold #ComputationalBiology
4
802
Physical-aware model accuracy estimation for protein complex using deep learning method 1. This study introduces DeepUMQA-PA, a deep learning model designed to estimate the accuracy of protein complex structures by integrating physical-aware features like contact surface area and orientation, enhancing model precision for multimeric proteins. 2. Using Voronoi tessellation, DeepUMQA-PA calculates detailed contact features, capturing critical interactions between residues and solvents, which is especially useful for complexes with weak evolutionary signals like nanobody-antigen pairs. 3. The model leverages equivalent graph neural networks (EGNN) and ResNet layers with attention mechanisms, allowing it to outperform previous models like DeepUMQA3 in residue-wise prediction accuracy, with significant improvements of 16.8% in Pearson and 15.5% in Spearman correlations on specific nanobody-antigen datasets. 4. DeepUMQA-PA surpasses AlphaFold-Multimer and AlphaFold3 on 43% and 50% of tested targets, respectively, particularly excelling in regions where these models show high uncertainty, thus complementing traditional methods in protein structure accuracy assessment. 5. Ablation studies confirm the essential role of physical-aware features, showing a marked decline in accuracy when contact area and orientation features are removed, highlighting their contribution to capturing protein-protein interaction dynamics. 6. This approach represents a major step in protein structure validation, and the authors anticipate that expanding DeepUMQA-PA to DNA/RNA-protein complexes and small molecule interactions will unlock further applications in structural biology. 📜Paper: biorxiv.org/content/10.1101/… #ProteinComplex #DeepLearning #ModelAccuracy #Bioinformatics #GraphNeuralNetworks
5
28
2,627
Online now - the Spotlight "Constructive neutral evolution of homodimer to heterodimer transition" from Anne-Ruxandra @carvunis and colleagues. #GeneDuplication #ProteinComplex #ObligateHeterodimer #neofunctionalization authors.elsevier.com/a/1j~…
1
2
4
495
27 Jun 2024
🌟#notablepaper on #Selfassembly 📚Self-Assembling Lectin Nano-Block Oligomers Enhance Binding Avidity to Glycans 🔗mdpi.com/1440830 👨‍🔬By Prof. Ryoichi Arai et al @MDPIOpenAccess @MDPIBiologySubj #artificialprotein #fusionprotein #lectin #engineering #proteincomplex
2
121
🎁HBT #NobelLaureate Hartmut Michel turning 75 today. After a PhD in 1977 with Dieter Oesterhelt (l-i-c.org/1128) @Uni_WUE he succeeded—@MPI_Biochem, in the #crystallization of a membrane #proteincomplex, the #photosynthetic reaction centre of a purple bacterium. Together with Robert Huber and Johann Deisenhofer the structure of this complex was solved and this earned the team the #NobelPrize in #chemistry 1988. Since 1987 Michel has been a director @MPIbp in Frankfurt, a hotbed of #membrane #proteinscience. #membraneproteins #photosynthesis #biochemistry
14
2,053
#RNA: Don’t kill the messenger A newly found #ProteinComplex plays a vital role in RNA protection & stability, during its journey between DNA and the cell’s protein factory. discovery.kaust.edu.sa/en/ar… discovery.kaust.edu.sa/en/ar…

1
2
575
11 Apr 2023
💜 PROTEIN COMPLEX SABOR AÇAÍ 💜 Novo sabor na nossa família de proteínas, Team 👊 Já disponível em nossa loja oficial! O link está bio 💪 #ProteinComplex #NewMillen #ViviWinkler
3
1,027
Biotin receptor-mediated intracellular delivery of synthetic polypeptide-protein complexes. | S. Stolnik @UoN_Pharmacy | @UKICRS #lungdelivery #proteincomplex #biotintargeting doi.org/10.1016/j.jconrel.20…

4
873
8 Mar 2023
Newly found #proteincomplex plays a vital role in #RNA protection and stability @kaust_news doi.org/grwk6d phys.org/news/2023-03-newly-…

4
9
5,225
Sharing #RSCPosterPitch for the poster "Why long-scale MD simulation? Insights from the discovery of Novel B, C-ring truncated deguelin derivatives for the cancer treatment". Also attached poster in separate tweet. Looking forward to your questions. #RSCChemBio @RoySocChem
1
1
10
2,365