Biology AI Daily

Biology AI Daily

Users
Tweets

Apr 17

Benchmarking MSA pairing for protein-protein complex structure prediction reveals a depth-over-pairing principle 1. The study benchmarks whether “paired MSAs” are truly necessary for accurate protein complex prediction in AlphaFold3, and concludes a practical rule: MSA depth matters more than enforcing correct inter-chain pairing. 2. On a stringent benchmark of 439 non-redundant heterodimers (HD439; filtered to avoid similarity to AF3 training interfaces), adding species-paired MSAs (pMSA) only slightly improves mean DockQ vs unpaired concatenated monomer MSAs (mMSA): 0.613 vs 0.602. 3. The key control is “shuffle pairing” (sMSA): the paired sequences are randomly permuted to break inter-chain coevolution while preserving depth and composition. sMSA matches pMSA almost exactly (mean DockQ 0.612; P=0.96), implying the small gain from pMSA is largely due to extra sequences (depth), not correct pairing constraints. 4. Inter-species complexes are substantially harder than intra-species ones (pMSA mean DockQ 0.545 vs 0.632), consistent with weaker or absent coevolution and much shallower paired MSAs. Notably, for inter-species targets, sMSA can outperform pMSA (0.561 vs 0.545), suggesting species-based pairing can inject incorrect constraints/noise. 5. A depth-maximizing strategy, uMSA, merges the monomer MSAs with the raw UniProt hits used for pairing but without pairing. uMSA achieves the best overall mean DockQ on HD439 (0.623), outperforming pMSA/sMSA, and helps both intra- and inter-species subsets. 6. Case studies show why depth helps: when one chain has a shallow MSA, monomer folding degrades and complex docking collapses; adding many homologs (even unpaired) restores monomer quality and enables accurate docking. uMSA can also improve “MSA quality” under fixed input limits by enriching medium-identity signal and reducing low-identity noise. 7. Mechanistic interpretation: AF3 can succeed without explicit inter-chain coevolution because (i) interfaces can be determined by physicochemical/shape complementarity once monomers are well constrained by deep MSAs, and (ii) AF3’s architecture (MSA module deep Pairformer triangle updates) can iteratively recover latent interaction patterns from mixed signals, reducing dependence on explicit pairing. 8. Alternative pairing methods (annotation-based Distance/STRING/PDB and PLM-based pairing variants) provide only minor differences; default species pairing is slightly best among pairing methods, but the core result persists: uMSA (more homologs, no pairing) is consistently competitive or better. 9. External validation with omicMSA (very deep MSAs from unassembled reads/draft genomes) supports the principle: big gains come from deeper sequence resources. Shuffling paired sequences remains comparable to pairing for most targets, and a fully unpaired concatenation of omicMSA subunit MSAs (oMSA) slightly outperforms paired omicMSA on average. 10. The paper also maps failure modes: accuracy drops with very large complexes (linked to AF3 crop/token limits), with small relative interface areas (transient/IDR-rich interactions), and for lower-quality experimental references (notably low-resolution cryo-EM and NMR ensembles). It also shows pairing is more important for AFM and RF2 than AF3, tying pairing sensitivity to architecture depth/updates; for AFM, pooling models from multiple MSA modes can improve results. 📜Paper: biorxiv.org/content/10.64898… #AlphaFold3 #ProteinComplex #MSA #Bioinformatics #ComputationalBiology #StructuralBiology #ProteinProteinInteractions #DeepLearning

2,222

Biology AI Daily

Biology AI Daily @BiologyAIDaily

27 Jun 2025

ContrastQA: A Label-guided Graph Contrastive Learning-based approach for protein complex structure quality assessment １．ContrastQA introduces a novel framework for protein complex model quality assessment by integrating label-guided graph contrastive learning with geometric graph neural networks. It is the first method to guide sample pairing using interface quality scores (DockQ), significantly improving protein structure embedding learning. ２．Unlike traditional contrastive learning that relies on random data augmentations, ContrastQA defines positive and negative pairs based on domain knowledge—specifically, the DockQ score—which reflects interface quality. This label-guided strategy allows more precise learning of local and global structural features. ３．The model architecture combines GVP-GNN for geometric reasoning with a customized contrastive loss that weighs negative pairs by DockQ score differences. This ensures better differentiation between high-quality and low-quality decoys from the same target. ４．On the CASP16 dataset, ContrastQA achieved state-of-the-art performance among 12 advanced methods, with the lowest TMscore Top1 loss (0.123) and GDT-TS Top1 loss (0.116), representing improvements of 10.9% and 8.7%, respectively, over the second-best approaches. ５．ContrastQA is particularly strong on difficult targets. For CASP16 hard targets, it reduced TMscore ranking loss by 14.9% compared to the next best method, showcasing its potential for evaluating challenging and template-free protein complexes. ６．The method also excels on high-precision datasets. On the ABAG-AF3 dataset, derived from AlphaFold3 predictions, it achieved a 0.028 TMscore ranking loss—9.7% lower than VoroIF-GNN—and 0.021 GDT-TS loss—22.2% lower than the runner-up, demonstrating broad applicability across datasets. ７．Ablation studies show the graph contrastive learning module is critical: removing it worsened CASP16 ranking loss by up to 37.9%. The weighted contrast loss design also had a substantial impact, reinforcing the importance of their novel contrastive learning strategy. ８．ContrastQA requires far less training data than other top competitors. It was trained on ~20,000 decoys, compared to millions used by CASP16 teams like GuijunLab-QA, yet still achieved superior performance—suggesting strong generalizability and sample efficiency. ９．Future extensions will include integrating physicochemical interface features into sample pairing and expanding from global to local (interface or residue-level) accuracy estimation, moving toward multi-task EMA frameworks for protein complexes. 💻Code: github.com/Cao-Labs/Contrast… 📜Paper: biorxiv.org/content/10.1101/… #ProteinComplex #StructuralBiology #GraphNeuralNetwork #ContrastiveLearning #AlphaFold #DeepLearning #Bioinformatics #ProteinQA

1,252

Biology AI Daily

Biology AI Daily @BiologyAIDaily

30 May 2025

Comparative assessment of co-folding methods for molecular glue ternary structure prediction １．This study provides the first large-scale benchmark of AI-based co-folding methods for modeling molecular glue (MG)-mediated ternary complexes, a foundational step toward rational MG design. ２．The authors curated MG-PDB, a high-quality dataset of 221 non-covalent MG ternary complexes, and introduced MGBench, a challenging benchmark set of 88 structures unseen during model training. ３．They evaluated five state-of-the-art co-folding models: AlphaFold 3 (AF3), Chai-1, Boltz-1, Protenix, and RoseTTAFold All-Atom (RFAA), using multiple structural quality metrics (DockQ, LDDT-PLI, Ligand RMSD). ４．AlphaFold 3 demonstrated the best overall performance, with a 50.6% success rate for protein-protein interfaces and 32.9% for MG-protein interaction recovery in MGBench, outperforming other methods by a significant margin. ５．However, performance dropped sharply for novel cases (low homology), revealing a strong memorization effect. Models performed well mainly on structures similar to their training data. ６．Large PPI interfaces, domain-domain interaction types, and MGDs (degraders) posed particular challenges to all methods, indicating a need for better modeling of complex binding scenarios and cooperativity. ７．Domain-motif interactions and known CRBN-G-loop degrons were modeled with higher accuracy, suggesting current methods exploit conserved structural motifs when available. ８．Detailed case studies showed accurate modeling of CRBN-dWIZ-1-WIZ(ZF7) complex but failure on novel E3 ligase systems like KBTBD4:UM171:HDAC1, reinforcing the need for broader training data and improved generalization. ９．This benchmark confirms co-folding methods have yet to learn atomic-level interaction rules required for truly de novo MG complex prediction, especially for systems lacking co-evolutionary signals. １０．MG-PDB and MGBench offer critical resources to drive forward structure-based MG discovery and guide the development of next-generation co-folding and scoring models. 💻Code: github.com/yiyanliao/MGBench 📜Paper: biorxiv.org/content/10.1101/… #MolecularGlue #AI4Science #AlphaFold #ProteinDesign #DrugDiscovery #ProteinComplex #TargetedDegradation #ComputationalBiology

4,988

Biology AI Daily

Biology AI Daily @BiologyAIDaily

30 May 2025

555

Biology AI Daily

Biology AI Daily @BiologyAIDaily

18 Apr 2025

An All-Atom Generative Model for Designing Protein Complexes 1. APM (All-Atom Protein Generative Model) is a novel generative framework specifically designed to model, fold, and generate multi-chain protein complexes at all-atom resolution—an area long underserved by traditional single-chain models. 2. Unlike methods that rely on pseudo-sequence linking for multi-chain modeling, APM handles native multi-chain structures through architecture and data-level innovations, allowing precise modeling of inter-chain interactions. 3. APM integrates a three-module pipeline: (1) Seq&BB module for co-generating backbone and sequence via flow matching, (2) Sidechain module to generate full-atom sidechain conformations, and (3) Refine module to optimize structures with all-atom awareness. 4. To maintain sequence-structure coherence during generation, APM employs a novel decoupled noising and two-phase training strategy, enabling high-fidelity reconstruction across both modalities. 5. Benchmarks on single-chain tasks show APM performs competitively with leading models like ESM3 and ESMFold, and outperforms MultiFlow and ProteinGenerator on inverse folding and structure generation across various protein lengths. 6. APM is one of the first generative models to demonstrate reliable folding and inverse folding on multi-chain proteins without MSA, outperforming Boltz-1 (noMSA) and achieving high amino acid recovery and scTM scores. 7. In de novo complex generation, APM achieves significantly stronger binding energies and lower RMSD compared to Chroma, validating its ability to design well-packed interfaces using all-atom features. 8. APM’s chain-by-chain conditional generation offers controllable complex formation, supporting flexible design strategies where chains fold independently and bind cooperatively. 9. On downstream applications, APM achieves state-of-the-art performance in antibody CDR-H3 co-design (RAbD benchmark) and targeted peptide design (LNR dataset), surpassing specialized models like dyMEAN, DiffAb, and PepGLAD in binding affinity and structure quality. 10. By explicitly modeling all-atom details, natively handling multi-chain systems, and supporting both zero-shot and fine-tuned design tasks, APM paves the way for next-generation protein complex design with broad applications in therapeutic development. 💻Code: github.com/bytedance/apm 📜Paper: arxiv.org/abs/2504.13075 #proteincomplex #proteindesign #bioinformatics #proteinengineering #deepgenerativemodels #multichainproteins #antibodysdesign #peptidedesign #AI4Science #APMmodel #structuregeneration

10,258

Biology AI Daily

Biology AI Daily @BiologyAIDaily

18 Apr 2025

6,936

Biology AI Daily

Biology AI Daily @BiologyAIDaily

18 Apr 2025

Assessing scoring metrics for AlphaFold2 and AlphaFold3 protein complex predictions 1. This study benchmarks commonly used scoring metrics to assess the quality of protein complex models predicted by AlphaFold2 (via ColabFold) and AlphaFold3, revealing key differences in their ability to evaluate heterodimeric interfaces accurately. 2. Using 325 high-resolution heterodimeric structures, the authors found that ColabFold with templates and AlphaFold3 produced similarly high proportions of correct models, both outperforming ColabFold without templates, especially in generating high DockQ-scoring models. 3. Interface-specific scores—particularly ipTM and model confidence—emerged as the most reliable metrics for evaluating protein complex models, consistently showing the highest correlation with DockQ across all datasets. 4. Global scores like pLDDT and PAE were less effective than interface-focused metrics such as ipLDDT and iPAE. Notably, pDockQ2 had poor correlation with actual model quality, often misclassifying high-quality models. 5. To address these inconsistencies, the authors developed C2Qscore, a combined score using linear regression over multiple interface-specific metrics. C2Qscore outperformed individual metrics and unweighted averages, especially in X-ray-validated datasets. 6. The study derived method-specific cutoffs for key metrics like ipTM to better classify models as correct or incorrect, including within the “gray zone” (ipTM = 0.6–0.8) defined by AlphaFold. These thresholds improved assessment precision and reduced false positives. 7. While AF3 produced more consistent results across replicates, CF-T (ColabFold with templates) generated a slightly higher number of top-quality predictions. However, AF3 models were more reliable when evaluated across a broader quality range. 8. In complex cryoEM-derived assemblies, the study highlights limitations of all evaluated methods: when multiple dimeric configurations exist, DockQ often misclassifies good models due to reference mismatches, but C2Qscore remained relatively robust. 9. The authors integrated C2Qscore into the ChimeraX plug-in PICKLUSTER v2.0, enabling researchers to visualize and assess predicted interfaces with detailed, customizable metrics—including support for AlphaFold3 models. 10. This work provides a practical guide for interpreting and comparing protein complex predictions across AF2/ColabFold and AF3, offering scoring thresholds, benchmarking insights, and tools for improving confidence in computational structural biology. 💻Code: gitlab.com/topf-lab/picklust… 📜Paper: biorxiv.org/content/10.1101/… #AlphaFold2 #AlphaFold3 #proteincomplex #structureprediction #bioinformatics #modelassessment #interfaceprediction #ColabFold #scoringmetrics #AI4Science

4,599

Biology AI Daily

Biology AI Daily @BiologyAIDaily

9 Feb 2025

Rapid and sensitive protein complex alignment with Foldseek-Multimer @naturemethods 1. Foldseek-Multimer introduces a groundbreaking method for aligning protein complexes, achieving 3-4 orders of magnitude faster performance than traditional methods, while maintaining accuracy comparable to the gold standard, US-align. 2. The key innovation lies in its efficient chain-to-chain comparison using Foldseek, followed by superposition vector clustering to align entire complexes. This approach allows for extremely rapid complex-to-complex comparisons across massive databases. 3. With the ability to align billions of protein complex pairs in just 11 hours, Foldseek-Multimer offers unprecedented speed, enabling large-scale analysis of protein complexes in the AlphaFold era, especially in metagenomic studies. 4. The method has been shown to be highly sensitive, identifying structural similarities even between distant homologs, which traditional methods might miss. This is particularly useful for comparing complexes with low sequence similarity. 5. In benchmarks, Foldseek-Multimer identified millions of new similar homomeric pairs that other methods missed, demonstrating its superior capability in detecting protein complex similarities at scale. 6. Its speed and sensitivity make Foldseek-Multimer an essential tool for protein complex analysis, from basic research to applications in drug discovery and structural biology, allowing faster insights into complex molecular functions. 7. Available as open-source software, Foldseek-Multimer is accessible for researchers globally, with support for both command-line and web server usage, making it easy to integrate into existing workflows. 💻Code: github.com/steineggerlab/fol… 📜Paper: nature.com/articles/s41592-0… #ProteinComplex #StructuralBioinformatics #ProteinAlignment #Bioinformatics #MachineLearning #DrugDiscovery #Metagenomics #AlphaFold #ComputationalBiology

5,373

Biology AI Daily

Biology AI Daily @BiologyAIDaily

9 Feb 2025

Rapid and sensitive protein complex alignment with Foldseek-Multimer @naturemethods 1. Foldseek-Multimer introduces a groundbreaking method for aligning protein complexes, achieving 3-4 orders of magnitude faster performance than traditional methods, while maintaining accuracy comparable to the gold standard, US-align. 2. The key innovation lies in its efficient chain-to-chain comparison using Foldseek, followed by superposition vector clustering to align entire complexes. This approach allows for extremely rapid complex-to-complex comparisons across massive databases. 3. With the ability to align billions of protein complex pairs in just 11 hours, Foldseek-Multimer offers unprecedented speed, enabling large-scale analysis of protein complexes in the AlphaFold era, especially in metagenomic studies. 4. The method has been shown to be highly sensitive, identifying structural similarities even between distant homologs, which traditional methods might miss. This is particularly useful for comparing complexes with low sequence similarity. 5. In benchmarks, Foldseek-Multimer identified millions of new similar homomeric pairs that other methods missed, demonstrating its superior capability in detecting protein complex similarities at scale. 6. Its speed and sensitivity make Foldseek-Multimer an essential tool for protein complex analysis, from basic research to applications in drug discovery and structural biology, allowing faster insights into complex molecular functions. 7. Available as open-source software, Foldseek-Multimer is accessible for researchers globally, with support for both command-line and web server usage, making it easy to integrate into existing workflows. @thesteinegger @clmgilchrist 💻Code: github.com/steineggerlab/fol… 📜Paper: nature.com/articles/s41592-0… #ProteinComplex #StructuralBioinformatics #ProteinAlignment #Bioinformatics #MachineLearning #DrugDiscovery #Metagenomics #AlphaFold #ComputationalBiology

802

Biology AI Daily

Biology AI Daily @BiologyAIDaily

4 Nov 2024

Physical-aware model accuracy estimation for protein complex using deep learning method 1. This study introduces DeepUMQA-PA, a deep learning model designed to estimate the accuracy of protein complex structures by integrating physical-aware features like contact surface area and orientation, enhancing model precision for multimeric proteins. 2. Using Voronoi tessellation, DeepUMQA-PA calculates detailed contact features, capturing critical interactions between residues and solvents, which is especially useful for complexes with weak evolutionary signals like nanobody-antigen pairs. 3. The model leverages equivalent graph neural networks (EGNN) and ResNet layers with attention mechanisms, allowing it to outperform previous models like DeepUMQA3 in residue-wise prediction accuracy, with significant improvements of 16.8% in Pearson and 15.5% in Spearman correlations on specific nanobody-antigen datasets. 4. DeepUMQA-PA surpasses AlphaFold-Multimer and AlphaFold3 on 43% and 50% of tested targets, respectively, particularly excelling in regions where these models show high uncertainty, thus complementing traditional methods in protein structure accuracy assessment. 5. Ablation studies confirm the essential role of physical-aware features, showing a marked decline in accuracy when contact area and orientation features are removed, highlighting their contribution to capturing protein-protein interaction dynamics. 6. This approach represents a major step in protein structure validation, and the authors anticipate that expanding DeepUMQA-PA to DNA/RNA-protein complexes and small molecule interactions will unlock further applications in structural biology. 📜Paper: biorxiv.org/content/10.1101/… #ProteinComplex #DeepLearning #ModelAccuracy #Bioinformatics #GraphNeuralNetworks

2,627

Trends in Biochemical Sciences

Trends in Biochemical Sciences @TrendsBiochem

30 Oct 2024

Online now - the Spotlight "Constructive neutral evolution of homodimer to heterodimer transition" from Anne-Ruxandra @carvunis and colleagues. #GeneDuplication #ProteinComplex #ObligateHeterodimer #neofunctionalization authors.elsevier.com/a/1j~…

495

YesPunjab.com

YesPunjab.com

@yespunjab

12 Oct 2024

Scientists find new brain target for anxiety disorders yespunjab.com/?p=46764 #MentalHealth #AnxietyDisorders #Neuroscience #BrainResearch #SynapticConnections #MentalIllness #CognitiveBehavior #TherapeuticInsights #ProteinComplex #ExcitatorySynapses #Neuropsychiatry

Scientists find new brain target for anxiety disorders - Yes Punjab News

New Delhi, Oct 12, 2024 Mental illnesses, such as anxiety disorders, autism and schizophrenia are among the leading health disorders worldwide. Scientists now report a new brain target for potential...

yespunjab.com

Phys.org

Phys.org

@physorg_com

7 Aug 2024

Scientists take atomic look at a #proteinComplex that grants access to our DNA @BerkeleyLab @sciencemagazine phys.org/news/2024-08-scient…

Scientists take atomic look at a protein complex that grants access to our DNA

To transcribe the information contained in our genes or to repair the dozens of breaks that occur daily in our DNA, our enzymes must be able to directly access the DNA to perform their functions....

phys.org

7,040

IJMS MDPI

IJMS MDPI @IJMS_MDPI

27 Jun 2024

🌟#notablepaper on #Selfassembly 📚Self-Assembling Lectin Nano-Block Oligomers Enhance Binding Avidity to Glycans 🔗mdpi.com/1440830 👨‍🔬By Prof. Ryoichi Arai et al @MDPIOpenAccess @MDPIBiologySubj #artificialprotein #fusionprotein #lectin #engineering #proteincomplex

Schematics of the lectin nano-blocks. (A) Design of the lectin nano-blocks. The lectin nano-blocks were constructed by fusing the dimeric de novo protein WA20 (PDB ID: 3VJF) [9] to the dimeric lectin Agrocybe cylindracea galectin (ACG) (PDB ID: 1WW7) [30] with different type of linkers (HL4, FL4, SL, and H). In addition, WA20-ΔN3ACG was constructed by fusing WA20 and ACG without a linker and with the deletion of the N-terminal 3 aa of ACG. Since both WA20 and ACG form dimer, the lectin nano-blocks are expected to form self-assembling oligomers in multiples of 2-mer. (B) Schematics of the binding of the lectin nano-blocks and ACG to target glycans on cells. Because the lectin nano-block oligomers have more carbohydrate recognition domains (CRDs) than the original ACG, they are expected to enhance the binding avidity to target glycans by multivalent binding effect.

ALT Schematics of the lectin nano-blocks. (A) Design of the lectin nano-blocks. The lectin nano-blocks were constructed by fusing the dimeric de novo protein WA20 (PDB ID: 3VJF) [9] to the dimeric lectin Agrocybe cylindracea galectin (ACG) (PDB ID: 1WW7) [30] with different type of linkers (HL4, FL4, SL, and H). In addition, WA20-ΔN3ACG was constructed by fusing WA20 and ACG without a linker and with the deletion of the N-terminal 3 aa of ACG. Since both WA20 and ACG form dimer, the lectin nano-blocks are expected to form self-assembling oligomers in multiples of 2-mer. (B) Schematics of the binding of the lectin nano-blocks and ACG to target glycans on cells. Because the lectin nano-block oligomers have more carbohydrate recognition domains (CRDs) than the original ACG, they are expected to enhance the binding avidity to target glycans by multivalent binding effect.

121

Lives in Chemistry

Lives in Chemistry

@livesinchem

18 Jul 2023

🎁HBT #NobelLaureate Hartmut Michel turning 75 today. After a PhD in 1977 with Dieter Oesterhelt (l-i-c.org/1128) @Uni_WUE he succeeded—@MPI_Biochem, in the #crystallization of a membrane #proteincomplex, the #photosynthetic reaction centre of a purple bacterium. Together with Robert Huber and Johann Deisenhofer the structure of this complex was solved and this earned the team the #NobelPrize in #chemistry 1988. Since 1987 Michel has been a director @MPIbp in Frankfurt, a hotbed of #membrane #proteinscience. #membraneproteins #photosynthesis #biochemistry

ALT Hartmut Michel in his Frankfurt Laboratory, ca. 1988 · unknown photographer, published on pinterest by “C Afé Ruiz”

ALT Hartmut Michel lecture at the 50 year anniversary ceremony of the MPI for Biochemistry Martinsried, April 2023 · photo by Eva E. Wille

2,053

Nayef Al-Rodhan

Nayef Al-Rodhan

@SustainHistory

13 Apr 2023

#RNA: Don’t kill the messenger A newly found #ProteinComplex plays a vital role in RNA protection & stability, during its journey between DNA and the cell’s protein factory. discovery.kaust.edu.sa/en/ar… discovery.kaust.edu.sa/en/ar…

575

newmillen

newmillen @newmillen

11 Apr 2023

💜 PROTEIN COMPLEX SABOR AÇAÍ 💜 Novo sabor na nossa família de proteínas, Team 👊 Já disponível em nossa loja oficial! O link está bio 💪 #ProteinComplex #NewMillen #ViviWinkler

1,027

J Controlled Release

J Controlled Release @JCRnEDITORS

9 Apr 2023

Biotin receptor-mediated intracellular delivery of synthetic polypeptide-protein complexes. | S. Stolnik @UoN_Pharmacy | @UKICRS #lungdelivery #proteincomplex #biotintargeting doi.org/10.1016/j.jconrel.20…

873

Phys.org

Phys.org

@physorg_com

8 Mar 2023

Newly found #proteincomplex plays a vital role in #RNA protection and stability @kaust_news doi.org/grwk6d phys.org/news/2023-03-newly-…

5,225

Ashish Shrivastava

Ashish Shrivastava @ashishapv

2 Mar 2023

Have a look at this short video. #simulation #protein #ligand #docking #proteincomplex #desmond @DEShawResearch #opls

Dr. KIRAN LOKHANDE @jivkiran007

28 Feb 2023

Sharing #RSCPosterPitch for the poster "Why long-scale MD simulation? Insights from the discovery of Novel B, C-ring truncated deguelin derivatives for the cancer treatment". Also attached poster in separate tweet. Looking forward to your questions. #RSCChemBio @RoySocChem

2:06

2,365