🚀 Our perspective is out in
@Nature!
We present a roadmap for Multimodal Foundation Models (MFMs) — large AI models pretrained across multi-omics and multi-timepoint data — to serve as the computational backbone for building virtual cells.
Read the full paper in Nature:
nature.com/articles/s41586-0…
🔍 Why MFMs?
Biology is inherently multimodal, and molecular layers are deeply interconnected and context-specific. MFMs aims to integrate these layers to uncover shared biological principles that govern diverse cell states, offering a unified substrate for downstream inference.
🧠 What’s new?
💡 From hypothesis-driven to data-centric workflows: MFMs shift biology’s paradigm. Instead of crafting bespoke models for narrow tasks, we can now pretrain over massive datasets, distill foundational knowledge, and refine insights through lab-in-the-loop experimentation—where models guide experiments, and experiments update models.
🧬 Conditional gene regulation: MFMs go beyond static models. By training across multiple omics layers (e.g., chromatin accessibility, transcriptomics), they can learn context-specific gene functions and regulatory programs—key to understanding development and disease.
🧪 In silico perturbation: Biology’s combinatorial complexity is immense—thousands of genes, millions of interactions. MFMs provide a framework to simulate perturbations before wet-lab execution. Trained on CRISPR perturb-seq data, they can predict molecular responses across cell types, tissues, and time—enabling programmable biology at scale.
⚙️ What makes MFMs possible?
Envisioned techniques include:
- Unified tokenization from nucleotides to pathways
- Hybrid attention across intra- and inter-modal interactions
- Prompt-driven multitasking for temporal prediction, conditional generation, and modality translation
- Human knowledge integration from curated databases and biomedical literature
These design principles translate the architecture of foundation models into the molecular domain.
⚠️ What are the challenges?
MFMs aren’t just about scale—they demand accessibility, reliability, and transparency.
- Low-resource learning techniques (e.g., LoRA, adapters) are vital for democratizing training
- Human-agnostic benchmarks are needed, as conventional labels may punish models that uncover novel biology
- Uncertainty modeling is essential to mitigate hallucinations and increase scientific trust
Interpretability and ethical stewardship must be foundational in this emerging ecosystem.
Kudos to all co-authors for the collective effort and vision: @HOATIANCUI1,
@Alejandro__TL,
@mariabrbic,
@JulioSaezRod,
@simocristea,
@genophoria,
@mo_lotfollahi,
@fabian_theis.
Let’s build the future of virtual cells together.