CAPTAIN: A multimodal foundation model pretrained on co-assayed single-cell RNA and protein
1. The paper introduces CAPTAIN, a multimodal single-cell foundation model that learns unified cell representations from co-assayed RNA surface protein data, aiming to reduce the bias of transcriptome-only foundation models when protein phenotypes drive cell state.
2. A key enabling contribution is the scT&P-4M pretraining corpus: 4.26 million paired single cells from human and mouse, spanning 13 tissues, 249 samples, multiple platforms (mostly CITE-seq), and a harmonized vocabulary of 382 curated surface proteins with standardized naming and functional categorization.
3. Model design: a dual-encoder Transformer (RNA encoder protein encoder) with cross-modal attention to align modalities into a shared latent space. The RNA encoder is initialized from scGPT (pretrained on 33M scRNA cells) and augmented with a gene-knowledge module encoding priors (GRN, promoter sequence, gene family, co-expression), plus species tokens for human/mouse.
4. Pretraining objective is explicitly multimodal and multi-task: masked gene expression reconstruction (unsupervised), protein abundance prediction (supervised over measured proteins), and protein prediction interval/uncertainty estimation via quantile regression—so proteins are treated as first-class signals, not just auxiliary outputs.
5. Protein inference is framed in two settings: fine-tuned prediction (adapt on a subset of paired cells from a new dataset) and zero-shot prediction (no dataset-specific paired training). Predictions are restricted to the fixed 382-protein vocabulary (not open-vocabulary), but can include proteins absent from a study’s antibody panel.
6. On protein imputation/expansion benchmarks across diverse datasets (e.g., human PBMC, MALT, monocytes; mouse PBMC), CAPTAIN reports consistently strong performance vs Seurat, sciPENN, TotalVI, scTranslator, and scTEL, with competitive zero-shot behavior and broader coverage when other methods cannot produce predictions for many proteins.
7. For cell type annotation, CAPTAIN fine-tuning adds a classifier on learned embeddings (RNA-only or multimodal). It reports 96.1% accuracy on PBMC CITE-seq, and shows particular strength in fine-grained T cell subtype labeling in bone marrow (Macro-F1 0.73 vs Seurat 0.61 and scGPT 0.04), emphasizing the value of protein-aware representations when RNA alone is insufficient.
8. For integration/batch harmonization, CAPTAIN is evaluated on multi-batch scRNA-seq and multi-omic datasets, aiming to balance batch removal with biological conservation. It is also tested on difficult cross-platform settings, aligning >60,000 cells across CITE-seq, ECCITE-seq, and TEA-seq, with additional metrics highlighting the remaining difficulty of full cross-technology mixing.
9. A notable downstream application is protein-informed cell–cell communication: CAPTAIN imputes receptor abundance (protein) while using ligand expression from RNA, then performs ligand–receptor inference with permutation testing. In PBMCs, it prioritizes 22 interactions from CD4 naive T to NK cells, with 18/22 (81.8%) supported by literature, and validates predicted signaling by showing upregulation of NicheNet-prioritized target genes in “strong communication” receiver cells.
10. In a COVID-19 multi-sample analysis, CAPTAIN suggests severity-associated increases in platelet-to-monocyte signaling and highlights a protein-driven S100A9–CD36 axis that may be missed by transcriptome-only approaches due to post-transcriptional decoupling; the paper further assesses structural plausibility with protein–protein docking.
📜Paper:
doi.org/10.1038/s41467-026-7…
#SingleCell #CITEseq #MultiOmics #FoundationModels #Transformers #ProteinImputation #CellTypeAnnotation #BatchCorrection #CellCellCommunication #COVID19 #ComputationalBiology #Bioinformatics