JASMINE: A powerful representation learning method for enhanced analysis of incomplete multi-omics data
1.JASMINE is a new self-supervised representation learning framework designed to handle incomplete multi-omics data. Its key strength lies in learning compact embeddings that retain both shared and modality-specific information, enabling robust performance across diverse downstream tasks even under high rates of missing data.
2.Unlike many existing methods, JASMINE does not require retraining for each prediction task. A single training run per dataset produces task-agnostic embeddings that can be directly applied to classification, regression, survival prediction, or clustering.
3.To handle missing modalities, JASMINE combines cross-encoders (for modality-specific embeddings) and product-of-experts (PoE) models (for joint embeddings). These components are concatenated to form the final representations, allowing flexible integration of arbitrarily missing data.
4.JASMINE introduces dual contrastive learning: modality-level CL enforces consistency across modalities of the same sample, while sample-level CL uses deep clustering to preserve inter-sample relationships. This enhances the discriminative structure of the learned embeddings.
5.Orthogonality constraints are applied between joint and modality-specific representations to reduce redundancy and encourage the learning of complementary features, aligning with biological principles like consensus and complementarity.
6.In simulation studies, JASMINE outperformed baselines (MVAE, CLUE, IntegrAO, DCCA, GCCA) especially under extreme missingness (up to 99%) and unbalanced missing patterns. It was particularly robust when the number of modalities increased and the sample size was small.
7.In real-world data from TCGA (cancer) and ADNI (Alzheimer’s), JASMINE achieved top or near-top performance in all tasks.
8.Interpretability analyses revealed that the most predictive embedding feature (for cancer survival) aligned with known biological mechanisms. Enriched pathways included MAPK and Wnt signaling, GABA synthesis, and interleukin signaling—all well-documented in cancer progression literature.
9.JASMINE also revealed inter-modality biological correspondences. Key genes such as MAP2K1 and MAPK9 were identified across mRNA, methylation, proteomics, and miRNA layers, confirming that the model captures biologically meaningful cross-modal associations.
10.Overall, JASMINE presents a scalable, flexible, and biologically grounded solution for multi-omics integration, addressing both missing data and task generalization challenges in a principled manner.
💻Code:
github.com/PennBBL/JASMINE
📜Paper:
biorxiv.org/content/10.1101/…
#MultiOmics #RepresentationLearning #Bioinformatics #Alzheimers #CancerResearch #MachineLearning #DeepLearning #SelfSupervised #ContrastiveLearning #ComputationalBiology