2/ In biology, data diversity is often a bigger challenge than data scale, with existing datasets capturing only a fraction of true biological complexity. The inability to navigate these sparsely populated datasets is why many high profile foundation models for transcriptomics underperform simple baselines.
Alongside an efficient Multilayer Perceptron (MLP), TxPert combines curated literature graphs (STRING, GO) with proprietary, high-throughput experimental screening data to maximize predictive power.
TxPert shows promising progress towards predicting perturbation outcomes in entirely unseen cell lines where no perturbations were observed during training. This capability will bring us closer to performing highly targeted, confirmatory wet-lab screens, converting the lab from an exploratory tool to a validation tool, and ultimately helping to accelerate the discovery of novel medicines.