Let’s discuss the scaling law of virtual cells.
A Nature Methods paper (
nature.com/articles/s41592-0…) published yesterday is being interpreted by some as evidence that scaling laws do not hold for virtual cells. I read it in detail, and here are my 2 cents:
It is a useful benchmark, but not a direct test of scaling laws in causal single-cell foundation models or perturbation-native virtual cells.
The paper mainly studies PCA, scVI, Geneformer, and SCimilarity (which are relative small models) on observational atlas-style pretraining, with perturbation evaluation limited to a narrow Tahoe small-molecule/cancer-cell-line setting. These are important baselines on scFMs that focuses on learning cell embeddings, but they are not large-scale causal perturbation models (e.g, diffusion-based virtual cells, or other modern architectures designed natively for causal perturbation biology).
The metrics also matter. Cell-type F1 and batch-integration AvgBIO are reasonable atlas/embedding metrics, but they are also tasks that can saturate quickly. They are not direct measures of causal perturbation prediction, target ranking, rare differential-expression tails, OOD genetic perturbations, or generalization across biological contexts.
The “learning saturation point” in the paper is useful, but it is not really a scaling law. It asks: what is the smallest pretraining size that reaches within 95% of the best observed score on this benchmark? That is a helpful diagnostic, but it can be overinterpreted when the downstream task itself is saturated.
The perturbation result is, IMO, limited: a few selected Tahoe-100M small molecules across several cancer cell lines, evaluated with genewise R²/MSE. The paper itself reports that a “no-change” baseline beats fine-tuned models for most drugs, which says as much about the evaluation regime as about model scaling.
In fact, our scGPT work already showed three years ago that simply scaling the number of observational cells saturates quickly after a few million cells. So I agree with the warning: naive “more atlas cells = better virtual cell” is not enough.
But that is not the real scaling question.
In X-Cell, we study scaling across multiple axes: number of perturbation cells, number of biological contexts, perturbation diversity, and model parameters. On our Perturb-seq-scale data, we observe clear and encouraging scaling behavior. Similar trends are emerging from other perturbation-native virtual cell efforts as well.
The important question is not: can more atlas cells improve cell-type F1? It is: with larger Perturb-seq datasets, larger models, better architectures, and harder OOD splits, can we predict causal cellular responses across genes, combinations, doses, cell states, and contexts?
For X-Cell and the next generation of virtual-cell models, the goal is not just better embeddings. It is target ranking, rare DE tails, counterfactual biology, and prospective perturbation prediction.
So my reading is: this paper is a useful caution against naive scaling, not evidence that scaling laws do not apply to virtual cells.
The exciting regime is still open: scaling the right data, the right models, and the right objectives for causal cellular biology.