Meet the Single-Cell Scientists Mapping Cells in the Human Body
How researchers are navigating the challenges and promises of building single-cell catalogues to better understand disease
Lisa Dratva, a doctoral student at the Wellcome Sanger Institute, studies how people’s immune systems react to COVID-19. She wants to know which T cells activate and fight the virus causing the infection. To investigate this, Dratva would typically need to spend many years and significant funding resources building a comprehensive dataset of samples, collecting cells from both sick and healthy individuals to make comparisons.
But Dratva now has a game-changing shortcut, a reference database that already contains synthesized information on millions of immune cells gathered from more than 2,000 people, including some with COVID-19. This atlas of individual cells, curated by scientists at the Sanger Institute and other researchers from around the world, is part of the global Human Cell Atlas (HCA) consortium. The HCA, which is supported by the Chan Zuckerberg Initiative (CZI) and other funders, is a groundbreaking, scientist-led effort to chart every cell type in the human body — something that has never been done before.
“A few years ago, cell atlases like this did not exist,” said Dratva, who specializes in computational techniques. “The availability of usable data is proving to be super powerful, especially as these atlases become bigger and more complete.”
Across the world, researchers are mapping human organs such as lungs and kidneys. One group is cataloging cells in the heart. Another is focused on the motor cortex. Others are building multi-organ atlases, which can lead to new insights and clarify cross-tissue questions. Despite their different focus areas, all of these researchers share a common goal: to provide a clearer picture of the human body at the resolution of single cells, which could transform our understanding of health and disease.
Earlier this year, Dratva joined colleagues from efforts such as these at a CZI-hosted workshop. Participants not only discussed the creation of single-cell atlases but worked collectively to improve the computational methods necessary to compile and integrate data.
“We’re adding pieces to the jigsaw puzzle that is the human body,” said workshop participant Simone Webb, a postdoctoral researcher at Newcastle University and visiting scientist at the Sanger Institute, who is working on the HCA. “Right now, we’re still trying to figure out where the gaps are, but, ultimately, we want to have many completed puzzles representing the biologies of a diverse range of people.”
Collaborating at the workshop, researchers explored the challenges and promises of their burgeoning field, from the sizes of datasets to the nuances of distinguishing different types of cells. Participants shared new techniques for integrating data, many fueled by advances in machine learning, such as tools emerging from scverse, a collection of python libraries for single-cell analysis. They talked about their experiences with new tools for exploring and deriving insights from large volumes of data, such as Chan Zuckerberg CELL by GENE (CZ CELLxGENE), an open source tool built by CZI that allows scientists to visually explore and annotate high-dimensional single-cell datasets. They also discussed the importance of having a deep understanding of experimental techniques, gained by building bridges between the computer scientists and computational biologists writing code that integrates data and the biologists at the bench.
Extensive preparation went into the workshop in partnership with the Lattice team at Stanford’s Cherry Lab, which collaborates with CZI on setting standards and curating data submissions for CZ CELLxGENE. Prior to the workshop, participants helped identify specific blood and kidney datasets of value. The majority of these datasets were already in CZ CELLxGENE, and the Lattice team worked with participants to curate and standardize the remaining datasets. As a result, the time and effort it typically takes to gather files and experimental details from each included study was dramatically reduced. This helped facilitate a more focused analysis on the types and states of cells in blood and kidney samples at the workshop.
Those involved celebrated the value of such work and the promise cellular maps hold not only for accelerating biological research but for improving our understanding of diseases.
“Even though the cell is the fundamental unit of life and all diseases have cellular mechanisms, the body is still very poorly understood at the resolution of cells,” said Jonah Cool, Science Program Officer, Single-Cell Biology program at CZI. “Cell atlases that capture the diversity and unique features of cells should therefore put us in a much better position to improve human health, helping us to develop targeted therapies to treat or manage disease.”
Consider the work of Lisa Sikkema, a graduate student at Helmholtz Munich who is contributing to the largest effort to map every cell type in the lung. To do so, Sikkema is compiling data from a variety of previous studies on lungs produced by dozens of labs around the world. These comprehensive datasets have helped Sikkema compare gene expression levels in cells, revealing categories ranging from types of epithelial cells, which facilitate gas exchange, to types of stromal cells, which provide connective tissue. Through Sikkema’s work, other researchers can now ask questions about the behaviors and interactions of these cells in healthy and diseased states.
“So many scientists are generating datasets in their own research bubbles,” said Sikkema. “I saw the huge potential of reusing and integrating all the datasets that are out there and making sure that anyone can access this information.”
The datasets of human lung cells that Sikkema is working with are not designed to be brought together. So, she’s focused on identifying and weeding out variation in the data caused by differences in methodologies and techniques, while leaving intact the biological variation that distinguishes one type of cell from another.
“The main problem is trying to separate biological variation, which helps us to identify different cell types, from batch effects caused by technical differences in how researchers collect and process their data,” said Karin Hrovatin, Sikkema’s colleague at Helmholtz Munich, who focuses on beta cells in the pancreas.
There are new tools that are helping the field move toward consensus, said Malte Lücken, principal investigator at Helmholtz Munich and another colleague of Sikkema’s. While integrating data on the lung, he found that cells had been held to different standards by the different teams that had generated the data. For example, inconsistent thresholds had been set for deciding whether cells in the lining of an airway were shaped like a club or a goblet, a distinction important for distinguishing two separate cell types that secrete different materials. To resolve this inconsistency, his group turned to CZ CELLxGENE.
Through CZ CELLxGENE’s Annotate tool, Lücken created a visualization of the data that served as a starting point for a conversation between the researchers to harmonize and standardize the data.
“We had encountered a lot of disagreement in the data,” said Lücken. “What we generated in CZ CELLxGENE Annotate enabled us to annotate the cellular diversity in the lung in a consensus fashion.”
The growing abundance and accessibility of data has both accelerated the work of single-cell biologists and created new challenges. Information on millions of cells can quickly add up into outsized datasets that are difficult to search, manipulate, and model. That’s why Dinithi Sumanaweera and others are working on approaches to reduce complex datasets into smaller, simple — but still useful — fair representations created from the full data.
“When you have a big dataset containing millions of cells, you need to handle it effectively,” said Sumanaweera, a Marie Curie postdoctoral fellow at the Sanger Institute. “Talking with the biologists I work with helped me think about what is important to include in the reductions that make such datasets easier for them to use.”
To design a practical cell atlas, she said, one must first think carefully about what it will be used for. One researcher might be interested in universal features of cells that we all have in common. Another might want to compare cells from people of different ages or cells in different parts of an organ. Trade-offs must be made, said Sumanaweera, because no one model may capture all this information.
Sumanaweera and other workshop participants highlighted that their field is still young and growing, trying different things and seeing what works. For Adam Gayoso, a PhD candidate studying computational biology at UC Berkeley, connecting with others in the field about the same problems he’s facing has helped him discover new ways to think about solutions. He’s also been able to find new collaborators, people he’s now working with who had previously been only names on research papers.
“One thing that became clear in the CZI workshop was that there were diversified visions of what building single-cell tissue atlases really means,” said Gayoso. “But everyone also thinks that establishing community consensus is important, and we’re excited to move forward together.”
He and others agree that being in the same room with other workshop participants catalyzed this effort.
“The scale of these datasets and the expertise required to generate single-cell atlases has historically resulted in a decentralized, distributed scientific effort,” said Cool. “When you bring researchers together, there are new lessons, new learnings that can further our efforts to understand the cellular mechanism of diseases.”
CZI’s Assembling Tissue References Workshop took place in May 2022. The event convened leaders of the single-cell community with an interest in assembling a large number of datasets generated by many different labs into a harmonized reference. CZI’s single-cell biology team would like to give special thanks to Rahul Satija for co-organizing the event.