Accelerating COVID-19 Research with New Single-Cell Technologies

Editor’s Note: This post has been updated as of 7/26/22 with the new name of CZ CELLxGENE (formerly cellxgene).

The COVID-19 pandemic demonstrates just how swiftly science can move. Much of the credit belongs to scientists’ ingenuity and grit. Single-cell biology also comes into play. New single-cell datasets and tech tools that support their analysis and the dissemination of results are also helping unlock the mysteries of COVID-19.

Chan Zuckerberg CELL by GENE Discover (CZ CELLxGENE Discover) gives researchers the power to visualize single-cell datasets and unearth interesting biology in an accessible way. Here, lung tissue data from Tabula Sapiens, a reference set of cells collected from many different tissues, is colored by cell type and filtered to isolate immune cells for further analysis.

The field of single-cell biology has blossomed over the past five years and added a robust set of experimental and computational tools to a biologist’s toolbox. Researchers can now measure the expression of every gene in the human genome in any cell in the human body. These measurements provide an extraordinary view into what makes an individual cell unique and how it functions in health, infection, and disease.

However, scientists and clinicians still struggle to access the single-cell datasets they need and use them to make new discoveries. The quickly moving global health crisis accentuated the problem.

The single-cell community responded to these needs with new resources, including Chan Zuckerberg CELL by GENE (CZ CELLxGENE), a data publishing and exploration platform for single-cell biology. Developed by the Chan Zuckerberg Initiative (CZI) Science Technology group, CZ CELLxGENE makes it easier for researchers without computational training to find and analyze single-cell data in common formats.

Visualizing the cells most vulnerable to COVID-19

By analyzing gene expression patterns using CZ CELLxGENE, researchers have identified cells in the nose, lung (pictured), and saliva that are vulnerable to SARS-CoV-2 infection. Photo provided by VIB-UGent.

Early in the pandemic, researchers at Human Cell Atlas (HCA), a global community whose mission is to create an open, shareable reference atlas of every cell in the human body, realized they could leverage their data to decipher the biology of COVID-19. For example, lung tissue data that could help them understand gas exchange at the single-cell level could also reveal the cells most susceptible to attack by the COVID-19 virus.

By April 2020, one research team had used HCA data and CZ CELLxGENE to identify cell types in the nose that could play a role in the transmission of SARS-CoV-2.

“We used CZ CELLxGENE to visualize the single-cell data from these large-scale atlas studies to help pinpoint the cells involved. These studies have shed light on mechanisms for infection, and also informed prevention strategies aimed at reducing the spread, such as wearing masks over the nose and highlighting potential aerosol transmission,” says Sarah Teichmann, a senior group leader at the Wellcome Sanger Institute and co-chair of the Human Cell Atlas Organizing Committee. The publication reporting these results has been cited by more than 1,000 other studies.

The results demonstrated the power of having a reference atlas of healthy human cells — and a scalable way to visualize it.

The HCA team used CZ CELLxGENE Discover, which enables biologists to rapidly characterize the expression of genes, pathways, and molecular mechanisms. It makes it possible to interrogate large single-cell datasets and ask questions about human biology.

Single-cell studies routinely measure the expression of tens of thousands of genes in more than 1 million cells — a trillion observations. The HCA Data Portal, for example, already contains measurements from more than 14 million cells.

Powering COVID-19 research with reusable data

In another study, HCA researchers identified how the immune response differs between patients with severe COVID-19 and those with no symptoms. The results help explain the progression of COVID-19 to severe disease.

“This was the largest single-cell study of COVID-19 immunity to date with about 800,000 immune cells, and CZ CELLxGENE was a very helpful tool to visualise the high-dimensional data sets,” says senior author Muzlifah Haniffa, Professor of Dermatology and Immunology at Newcastle University and Associate Faculty at the Wellcome Sanger Institute.

Led by Haniffa, the team collected and analyzed blood samples from patients infected with COVID-19 at three medical centers in the United Kingdom. Datasets like these from patient donors and the HCA’s datasets from healthy donors are now part of the COVID-19 Cell Atlas, a community-driven resource. Gene expression data from diverse tissues can be downloaded and visualized using CZ CELLxGENE Discover.

“One of the powers of having all these COVID-19 datasets hosted on CZ CELLxGENE Discover is you can open your browser and immediately start running experiments. For example, if you have a large collection of tissues, you can see the cell types that are expressing specific genes and nail down what tissues you want to focus on, design which culture systems you want to use, or decide on the appropriate organoid for studying COVID-19,” says Angela Pisco, a CZ CELLxGENE user who is Associate Director of Bioinformatics at the Chan Zuckerberg Biohub (CZ Biohub). “This is something that would take much longer to do in the lab than it does in silico using CZ CELLxGENE.”

Preparing for future diseases

Researchers can reuse the data in CZ CELLxGENE Discover.

When COVID-19 hit, the individual single-cell datasets for normal healthy tissues were decentralized and in different formats. It was a challenge to organize them so that different datasets could be combined to gain the statistical power needed to answer the most urgent questions about the disease. However, the single-cell community rallied to contribute both published and unpublished data to the fight against COVID-19 and took on the task of compiling them on a rapid timeline.

Next time, we want to be better prepared.

At CZI, we are building CZ CELLxGENE Discover, where researchers can share their single-cell data, enabling biologists and physicians to rapidly answer questions about the functions of cell types, genes, and pathways. We work with contributors to ensure that all datasets in the Collections Index are annotated and standardized. That way, other researchers can reuse them without the costly and time-consuming data wrangling that is commonly required to do so.

“To go to the CZ CELLxGENE Discover portal, where people have openly annotated data and shared it back with the community is incredible — no other tool is leveraging community input in the same way,” says Bruce Aronow, Co-director of the Computational Medicine Center at Cincinnati Children’s Hospital Medical Center.

A distinguishing feature of the CZ CELLxGENE approach is data integration. Within the Collections Index, researchers will find datasets from multiple research teams that they can easily integrate and test their hypotheses. For example, Aronow’s team created and published a “reusable datamine” to CZ CELLxGENE that combines single-cell data from blood and respiratory tract samples from people with COVID-19, influenza, and other diseases. Other researchers can use the datamine to investigate the body’s immune response to COVID-19, including how it differs from influenza.

“Even though there is a ton of [COVID-19] data out there, we are still underpowered for understanding what is driving the different versions of the disease. That’s where CZ CELLxGENE is necessary. The only chance to get organized around all this is to be able to put data together from multiple sites and multiple studies,” says Aronow.

By using a large, integrated dataset like the datamine, researchers can be more confident that their results are meaningful. “Meta-analysis is like going from a little home telescope to the Hubble Space Telescope. If you can see much more detail in those galaxies that are far beyond, then your understanding is significantly improved,” says Aronow.

As researchers worldwide characterize the multitude of cell types in the human body and their relationships to one another, we will continue to add to the collections in CZ CELLxGENE Discover. Our vision is to make high-quality single-cell data accessible to all scientists so they can work out the cellular mechanisms and interactions in any disease — including future pandemics. In the meantime, tools like CZ CELLxGENE will continue to help scientists piece together the puzzle of COVID-19, one cell at a time.

We’re actively improving CZ CELLxGENE Discover, including our ability to accept new data types. Researchers interested in helping us test and improve the platform can reach out at

Written by Lindsay Borthwick



Chan Zuckerberg Initiative Science

Supporting the science and technology that will make it possible to cure, prevent, or manage all diseases by the end of the century.