mTM-align2: A Server for Real-time Protein Structure Database Search and Alignment
1. mTM-align2 is presented as a real-time web server that searches structurally similar proteins across ~3 million structures (experimental predicted) in seconds, while supporting both monomeric proteins and multimeric complexes.
2. The key idea is to replace expensive all-vs-all structural alignments with fast cosine-similarity search over precomputed structural fingerprints (embeddings), then optionally run detailed pairwise/multiple alignments with mTM-align for interpretability and validation.
3. For monomers, the server encodes a query using residue-level embeddings from the pretrained inverse-folding protein language model ESM-IF, aggregates them into a protein-level vector (sum pooling), and uses contrastive learning so cosine similarity correlates with structural similarity.
4. For multimers, it combines (a) chain-wise monomer embeddings and (b) a rotation-invariant global shape descriptor based on 3D Zernike moments; the final similarity (Q-score) is a weighted sum of chain similarity (IF-score) and shape similarity (ZP-score), with weights α=1 and β=0.3.
5. mTM-align2 integrates multiple major databases in one interface: PDB (including large monomer and multimer sets), SCOPe and CATH domain databases, BFVD viral structures, plus several AlphaFold DB subsets (Swiss-Prot, organism set, global health set), enabling cross-database retrieval rather than siloed searches.
6. The server offers two search modes: a high-accuracy mode that adds a fast TM-align-based filtering step to improve precision, and a high-speed mode that skips filtering for near-instant results; results are returned as ranked hits (top 1000) and can be emailed as CSV for batch workflows.
7. Beyond retrieval, the output workflow is designed for downstream analysis: users can launch pairwise superposition with TM-align metrics (TM-score, RMSD) and residue-level correspondence, or run multiple structure alignment (up to 10 selected hits via the UI) to identify conserved cores and generate a structure-based phylogenetic tree.
8. A practical annotation feature is included for PDB hits: after superposition, the server transfers ligand context from Q-BioLiP and marks query residues within 5 Å of the aligned ligand as putative binding residues, providing fast template-based binding-site hints (with the paper noting it is not meant to replace specialized binding-site predictors).
9. Benchmarks against Foldseek variants suggest complementary strengths for monomer search (mTM-align2 wins on more test cases by summed true-positive TM-scores in top-100, while Foldseek leads on others), and stronger multimer retrieval performance versus foldseek-multimer (top-50 recall ~82.4% vs 77.7%; precision ~35.08% vs 27.66%).
10. A case study on the MscL mechanosensitive channel highlights sensitivity to remote homologs and conformational diversity: against AFDB-SwissProt, mTM-align2 retrieves many more true positives than Foldseek, and multiple alignment of top hits reveals strictly conserved transmembrane helices linked to gating architecture.
💻Code:
ngdc.cncb.ac.cn/biocode/tool…
📜Paper:
doi.org/10.1093/gpbjnl/qzag0…
#ProteinStructure #StructuralBioinformatics #AlphaFold #ProteinComplexes #StructureSearch #BioinformaticsTools #ESM #ContrastiveLearning #ZernikeMoments #TMAlign