Biology AI Daily

Biology AI Daily

28 Oct 2025

A Novel Framework for Multi-Modal Protein Representation Learning 1. The article introduces DAMPE, a new framework for protein function prediction that integrates sequence, structure, and extrinsic data like protein-protein interactions and Gene Ontology annotations. This integration is crucial for accurate function prediction as it combines intrinsic and extrinsic biological signals. 2. DAMPE addresses two key challenges: cross-modal distributional mismatch and noisy relational graphs. It uses Optimal Transport (OT) for representation alignment, effectively mitigating differences between sequence and structure embeddings, and Conditional Graph Generation (CGG) for robust information fusion, avoiding issues with noisy PPI networks. 3. The framework demonstrates significant improvements over state-of-the-art methods like DPFunc, achieving AUPR gains of 0.002–0.013 and Fmax gains of 0.004–0.007. Ablation studies highlight the contributions of both OT and CGG components, with OT contributing 0.043–0.064pp AUPR and CGG adding 0.005–0.111pp Fmax. 4. Theoretical analysis in the paper shows that the CGG objective drives the condition encoder to absorb graph-aware knowledge into protein representations, enhancing the robustness and effectiveness of the embeddings for downstream tasks. 5. DAMPE offers a scalable and theoretically grounded approach for multi-modal protein representation learning. It avoids the computational overhead of traditional GNNs by using a lightweight Mixture-of-Experts architecture, making it suitable for large-scale applications. 6. The framework is evaluated on standard Gene Ontology benchmarks, demonstrating superior performance across multiple metrics. It effectively captures functional patterns in protein embeddings, as shown through qualitative evaluations and clustering metrics. 📜Paper: arxiv.org/abs/2510.23273v1 #ProteinFunctionPrediction #MultiModalLearning #OptimalTransport #ConditionalGraphGeneration #Bioinformatics #DeepLearning

2,031