Filter
Exclude
Time range
-
Near
TranscriptFormer: A generative cell atlas across 1.5 billion years of evolution @ScienceMagazine science.org/doi/10.1126/scie…
3
24
71
5,112
A foundation model trained on 1.5 billion years of evolution Comparing cells across species is one of the oldest problems in biology, and one of the hardest. Different organisms share fewer and fewer genes the further apart they sit on the tree of life, so traditional methods relying on orthologous gene mapping quickly hit a wall. A frog and a coral, separated by hundreds of millions of years, simply don't have enough common vocabulary to compare their cell types directly. James Pearce and coauthors tackle this with TranscriptFormer, a generative autoregressive transformer trained on up to 112 million single cells from 12 species spanning 1.53 billion years of evolution: humans, mice, zebrafish, fruit flies, sponges, yeast, even malaria parasites. The key trick is that genes are not represented as discrete tokens but as ESM-2 protein language model embeddings, projecting every species into a shared, evolution-aware space. The model treats each cell as a "sentence" of genes and learns the joint distribution of which genes are expressed and at what level, using an expression-aware attention mechanism where transcript counts modulate the attention weights themselves. The results are remarkable. The model classifies cell types in stony coral (685 million years from humans, never seen in training) with F1 above 0.65, while previous state-of-the-art models drop below 0.5. It transfers immune perturbation labels across mouse, rat, rabbit, and pig with F1 of 0.92, and detects drug-induced transcriptional states in human cells with mean AUC of 0.88 across 95 compounds. Even more striking, phylogenetic relationships, developmental trajectories, and cell type hierarchies emerge in the embeddings without any supervision. Sponge choanocytes map to primary sensory neurons in worms and frogs, supporting old hypotheses about the origin of nervous systems. Pharma teams running cross-species safety studies could transfer perturbation signatures from rodents to human-relevant contexts without retraining, and biotech groups working on cell therapies or model organism screens get a generative tool that can be prompted to predict transcription factor targets directly. It is a step toward foundation models that act as queryable cell atlases rather than static lookup tables. Paper: Pearce et al., Science (2026) — journal license | science.org/doi/10.1126/scie…
1
12
82
3,975
12種・1億1200万細胞を学習した生成AI基盤モデルTranscriptFormerが、進化的に離れた種間でも高精度な細胞型分類やゼロショット疾患識別を実現し、発生軌跡や細胞階層性など普遍的細胞原理を自律的に獲得することを示した論文がScience誌に発表されました。 science.org/doi/10.1126/scie…
7
61
3,567
10/ Huge gratitude to the incredible collaborators and team who made this possible, especially James Pearce whose scientific stamina throughout this work were extraordinary; @StephenQuake , who worked closely with us throughout; and @czi @ChanZuckerberg / @biohub for supporting this long-term research effort. TranscriptFormer has set a high bar for its successor models. The road ahead is even more exciting.
1
4
770
8/ TranscriptFormer was also the first major step in the Virtual Cell effort we started at CZI a little over two years ago. Our goal was to begin building predictive AI systems that can represent, simulate, and eventually reason about cellular systems across scales and modalities. Here, we began with gene expression through the lens of evolution.
1
2
688
5/ This is why I see TranscriptFormer as a first flexible world model of cross-species transcriptomics grounded in evolutionary data. It learns a generative representation of cellular state that supports prediction, comparison, and hypothesis generation across species, cell states, and perturbations.
1
3
845
So, here’s what you need to know: ☑️ Drug discovery is becoming a factory. The competitive moat isn't the smartest AI model - it's the automation, the feedback loop, the throughput. Build repeatable design-build-test-learn engines fusing computational models with automated labs, or get left behind. ☑️ Under “what leaders are underestimating,” the report states flatly: “Agentic biology is now real.” AI systems that sense, decide, and execute in closed loops - not as demos, but as operating infrastructure. The report calls treating this as “just biotech R&D” a category error. ☑️ Virtual cells got its own section. The report names Geneformer, TranscriptFormer, Xaira, and NVIDIA's Virtual Cell Challenge as real efforts toward simulating how cells respond to drugs computationally. The fact that this concept has graduated from academic conferences to boardroom strategy documents tells you the direction we're going in. ☑️ The report gestures at data as a bottleneck but undersells how specific the problem is. It's not just that biology needs more data. It's that data diversity - not model size - may be the binding constraint on progress. The models getting the best results right now aren't always the biggest ones. (3/4)
2
1
8
1,236
I enjoyed speaking at #PMWC2026 on the transition from data to knowledge. We’re using foundation models like UCE and Transcriptformer to move beyond the "parts list" toward true clinical digital twins. As Sydney Brenner said: “Don't confuse data with knowledge.”
1
7
1,626
Closing my chapter at CZI Grateful to have led CZI’s AI efforts with a rigorous team, building AI for science from the ground up and helping shape the roadmap for AI-powered virtual cells, with foundational contributions like TranscriptFormer, VariantFormer, & rBio. Taking the holidays to recharge, then back to building in 2026. Onward! science.org/content/article/…
6
6
99
11,425
Excited to see our @ScientistTools ToolUniverse featured in @ScienceMagazine's piece on the future of virtual cell models From TranscriptFormer to other AI cell models, ToolUniverse lets AI scientists test, analyze, and build on these tools @HarvardDBMI @KempnerInst science.org/content/article/…
2 Nov 2025
Where do we stand with the Virtual Cell, the holy grail of biology? A feature @ScienceMagazine science.org/content/article/…
3
338
Time article discussing our work We recently had the opportunity to talk to TIME magazine about the burgeoning landscape of virtual cell research across the frontier AI and biology teams, which I see as a position piece about the future potential of this vision. Also fun to see them link to a couple of our papers, like TranscriptFormer and our Cell paper on VCMs. Lots of work ahead! time.com/7324119/what-is-vir…
1
1
5
773
Pearce et al. presented TranscriptFormer, a family of generative foundation models trained on up to 112 million cells spanning 1.53 billion years of evolution across 12 species. ➡️ biorxiv.org/content/10.1101/… #SingleCell #SpatialBiology
3
120
The Chan Zuckerberg Initiative has released precomputed TranscriptFormer embeddings for the entire CZ CELL×GENE Census covering 106M human cells and 42M mouse cells, spanning 1000 cell types in healthy tissue and across hundreds of disease contexts: cellxgene.cziscience.com/cen…
10
37
1,626