🔬 Exciting News! Our manuscript, "scGPT: toward building a foundation model for single-cell multi-omics using generative AI" is now finally published in Nature Methods (
@NatureMethods) 🎉 !!!
(Re-)Introducing scGPT: A transformative foundation model engineered for single-cell omics analysis. Developed through the analysis of over 33 million human cells, scGPT sets a new benchmark for application versatility, offering both fine-tuning and zero-shot capabilities.
Since its preprint in May 2023, scGPT has significantly impacted the field, evidenced by 13K installations, 600 GitHub stars 🌟, and 40 citations before its official publication!
scGPT has been validated by numerous benchmark studies as a leading foundation model in single-cell analysis. Its pre-trained embeddings extend its utility beyond single-cell studies, enhancing a variety of downstream tasks including protein enrichment and genetic perturbation predictions.
Some key updates lately:
---Expanded zero-shot applications for efficient reference mapping and integration, now with CellXGene census integration.
---Advanced perturbation analysis capabilities, including genome-scale perturb-seq data analysis and bulk sequencing data generalization.
---Upgraded scGPT package, offering versatile model loading compatible with PyTorch and flash-attn, for both GPU and CPU.
---Cloud-based scGPT applications for reference mapping, cell annotation, and gene regulatory network inference are available on
scgpthub.org.
---Integration with Hugging Face for easier model training.
Limitations:
scGPT is an early foray into foundation models for single-cell omics, facing challenges like limited zero-shot learning in some tasks, pretraining constraints, data quality issues, and evaluation limitations. See our Supplementary Notes for details.
🚀 Future Work?
Short-Term Goals:
1. Releasing a Mouse Model for broader analysis.
2. Developing a comprehensive evaluation suite for foundation models in single-cell analysis.
3. Creating a foundation model for single-cell spatial omics.
4. Enhancing zero-shot capacity by integrating scGPT with RAG (e.g., knowledge graphs).
Long-Term Goals:
1. Expanding scGPT for comprehensive single-cell multi-omics analysis.
2. Developing an in-silico perturbation model for predicting genetic perturbation effects.
3. Merging scGPT with multi-modal genomic sequence models for a deeper understanding of cell biology.
📚 Access the paper on Nature Methods:
nature.com/articles/s41592-0…
🔬Preprint in Bioarixv:
biorxiv.org/content/10.1101/…
💻 All our codes/data/weights are open source:
github.com/bowang-lab/scGPT
Wholehearted congratulations to all the authors, especially the two co-first authors, Haotian (
@HAOTIANCUI1 ) and Chloe (
@chloexwang1), who are really the emerging superstars in AI and biology!
@VectorInst @pmcc_ai @UofTCompSci @UofT_LMP @UHN @UofT
#scGPT #GenerativeAI #AI4Science #Combio #opensource