Lingshu-Cell: A generative cellular world model for transcriptome modeling toward virtual cells
Han Zhang, Guo-Hua Yuan, Chaohao Yuan, Tingyang Xu, Tian Bian, Hong Cheng, Wenbing Huang, Deli Zhao, Yu Rong
2026-04-01
Summary
This paper introduces Lingshu-Cell, a new computer model that simulates cells and predicts how they will react to changes. It's like building a virtual cell that scientists can experiment on without actually working with real cells.
What's the problem?
Currently, computer models of cells are good at showing what cells are *like* at a single moment, but they struggle to show how cells change and behave over time, or how different cells within a tissue vary. Existing models also often require researchers to pre-select which genes are important, potentially missing crucial information. The challenge is to create a model that can accurately represent the range of possible cell states and predict how they respond to different conditions.
What's the solution?
The researchers developed Lingshu-Cell, which uses a technique called a 'masked discrete diffusion model'. Essentially, it learns the patterns in gene expression data from many cells and then can generate new, realistic cell states. Importantly, it works directly with the raw gene expression data without needing to pick out specific genes beforehand. It also allows scientists to simulate what happens when a cell is exposed to a certain stimulus or has a specific identity, like a particular cell type or donor.
Why it matters?
Lingshu-Cell is a significant step forward because it provides a more flexible and accurate way to simulate cells. This could revolutionize biological research by allowing scientists to quickly and cheaply test different scenarios, like the effects of a new drug or a genetic mutation, all within a computer. It performed very well in standardized tests, suggesting it's a powerful tool for understanding cell behavior and discovering new biological insights.
Abstract
Modeling cellular states and predicting their responses to perturbations are central challenges in computational biology and the development of virtual cells. Existing foundation models for single-cell transcriptomics provide powerful static representations, but they do not explicitly model the distribution of cellular states for generative simulation. Here, we introduce Lingshu-Cell, a masked discrete diffusion model that learns transcriptomic state distributions and supports conditional simulation under perturbation. By operating directly in a discrete token space that is compatible with the sparse, non-sequential nature of single-cell transcriptomic data, Lingshu-Cell captures complex transcriptome-wide expression dependencies across approximately 18,000 genes without relying on prior gene selection, such as filtering by high variability or ranking by expression level. Across diverse tissues and species, Lingshu-Cell accurately reproduces transcriptomic distributions, marker-gene expression patterns and cell-subtype proportions, demonstrating its ability to capture complex cellular heterogeneity. Moreover, by jointly embedding cell type or donor identity with perturbation, Lingshu-Cell can predict whole-transcriptome expression changes for novel combinations of identity and perturbation. It achieves leading performance on the Virtual Cell Challenge H1 genetic perturbation benchmark and in predicting cytokine-induced responses in human PBMCs. Together, these results establish Lingshu-Cell as a flexible cellular world model for in silico simulation of cell states and perturbation responses, laying the foundation for a new paradigm in biological discovery and perturbation screening.