A Multi-Modal AI Copilot for Single-Cell Analysis with Instruction Following
Yin Fang, Xinle Deng, Kangwei Liu, Ningyu Zhang, Jingyang Qian, Penghui Yang, Xiaohui Fan, Huajun Chen
2025-01-15

Summary
This paper talks about InstructCell, a new AI tool that helps scientists study individual cells more easily. It combines the power of language understanding with the ability to analyze complex biological data from single cells.
What's the problem?
Scientists use a technique called single-cell RNA sequencing to study how genes work in individual cells. But the data from this technique is like a super complicated language that's hard to understand and work with using regular computer tools. This makes it tough for researchers to get the information they need from their experiments.
What's the solution?
The researchers created InstructCell, which is like a smart assistant for scientists. It can understand both normal human language and the complex 'language' of cell biology. They trained InstructCell on a huge collection of cell data and instructions written in regular language. This allows scientists to ask questions or give commands in plain English, and InstructCell can then analyze the cell data and give useful answers. It can do important tasks like figuring out what type of cell they're looking at or predicting how cells might react to different drugs.
Why it matters?
This matters because it makes studying individual cells much easier and more accessible for scientists. Instead of needing to be experts in complicated data analysis, researchers can now just ask questions in regular language and get answers. This could speed up research in areas like understanding diseases, developing new medicines, or learning how our bodies work at the tiniest level. By making this kind of research easier, InstructCell could lead to new discoveries that improve our health and understanding of biology.
Abstract
Large language models excel at interpreting complex natural language instructions, enabling them to perform a wide range of tasks. In the life sciences, single-cell RNA sequencing (scRNA-seq) data serves as the "language of cellular biology", capturing intricate gene expression patterns at the single-cell level. However, interacting with this "language" through conventional tools is often inefficient and unintuitive, posing challenges for researchers. To address these limitations, we present InstructCell, a multi-modal AI copilot that leverages natural language as a medium for more direct and flexible single-cell analysis. We construct a comprehensive multi-modal instruction dataset that pairs text-based instructions with scRNA-seq profiles from diverse tissues and species. Building on this, we develop a multi-modal cell language architecture capable of simultaneously interpreting and processing both modalities. InstructCell empowers researchers to accomplish critical tasks-such as cell type annotation, conditional pseudo-cell generation, and drug sensitivity prediction-using straightforward natural language commands. Extensive evaluations demonstrate that InstructCell consistently meets or exceeds the performance of existing single-cell foundation models, while adapting to diverse experimental conditions. More importantly, InstructCell provides an accessible and intuitive tool for exploring complex single-cell data, lowering technical barriers and enabling deeper biological insights.