Pointing the Way in Single-Cell Analysis
A Conversation with Rahul Satija on the Seurat Software Toolkit
Within the last decade, techniques for collecting and analyzing single-cell data have come to constitute a major, fast-moving field in biomedicine. The ability to tease out individual, molecular differences in large populations of cells is providing essential information on cellular function in health and disease, allowing researchers to glean insights on protein expression and many other variables across different cell states and modalities.
Collecting complex data on single cells is one matter. Being able to integrate and interpret that data is another — which is where the work of genomicist and computational biologist Rahul Satija comes in.
In 2015, Satija — then at the Broad Institute in Cambridge, Massachusetts — and colleagues (including his postdoctoral supervisor, Aviv Regev) launched the first version of Seurat, a software toolkit for biomedical researchers. Named for the famed pointillist artist Georges Seurat, who painstakingly combined discrete dots of paint into a unified image, the open source software harmonizes and integrates multiple single-cell datasets. It can, for example, help researchers combine data collected using disparate experimental methods from varying cell populations. At the time of Seurat’s initial publication, Satija moved to the New York Genome Center (NYGC) to start his own lab.
Seurat is downloaded by researchers worldwide upwards of 20,000 times a month, fulfilling its designers’ intentions of helping users “identify and interpret sources of heterogeneity from single-cell transcriptomic measurements, and integrate diverse types of single-cell data.” Earlier this month, the Satija Lab released Seurat v4, along with a preprint describing new methods for multimodal data analysis.
In addition to his work at the NYGC, Satija is expanding his engagement with single-cell research as the director of the newly formed Center for Integrated Cellular Analysis. This organization, encompassing six institutions across New York City, will develop technologies and open source software to advance the identification of different cellular states and the factors that govern their behavior and function.
Growing up in Potomac, Maryland, Satija’s interest in genetics was fueled by his proximity to the National Institutes of Health and The Institute for Genomic Research (now part of the J. Craig Venter Institute) during the race to sequence the human genome. At Duke University, he minored in mathematics, with a double major in biology and music. Satija won a music scholarship to Duke for playing the violin and was concertmaster for the Duke Symphony Orchestra.
Subsequently, on a Rhodes Scholarship, Satija earned his doctorate in statistics from the University of Oxford, U.K., in 2010. He then undertook postdoctoral research at the Broad Institute, pursuing his interest in the emerging area of single-cell analysis.
The following interview has been condensed and edited for clarity.
In December 2013, single-cell sequencing was selected by Nature Methods as the Method of the Year. At that time, what was your involvement in the field?
Rahul Satija: When I started my postdoc at the Broad, I didn’t have any intention of working on single-cell analysis. I was interested in working on methods for low-input RNA-seq, for example, from rare populations of sorted cells.
I was very fortunate to work with Alex Shalek, Aviv Regev, and Joshua Levin, who led some of the first pilot work at the Broad Institute for single-cell sequencing experiments. We went in without a lot of expectations — and part of the goal of the first experiment was to benchmark a new protocol. However, once we got the first dataset back, we were surprised (and excited) to see that a population of cells that we assumed were ‘homogeneous’ were in fact quite diverse, particularly in the way that they responded to an immune stimulus.
Even though the first datasets we generated were quite small, it was clear that the potential for the field was incredibly exciting. While it was an eye-opening moment for me, we were building upon the work of Fuchou Tang, Sten Linnarsson, Rickard Sandberg and others who had been pursuing single-cell RNA-seq since 2007, and are some of the earliest pioneers of this field.
How would you characterize the evolution of the single-cell field since then?
Satija: Over the last few years, single-cell genomics has transitioned from a new method that a few specialized labs can implement to a robust technology that biologists can routinely use to answer new questions. It’s exciting to see manuscripts using single-cell genomics make fundamental discoveries in many contexts — such as immunology, neuroscience, and developmental biology.
There are many examples not only of new cell types that researchers have found, but also new biomarkers of disease, previously undiscovered developmental pathways, and cell-state transitions that have been identified.
How does your work fit in — especially the Seurat software toolkit for analyzing single-cell data?
Satija: A particular challenge in the context of analyzing datasets is when you have many different datasets. For example, it’s challenging to compare datasets from people who are healthy and people who have diseases, datasets generated with different technologies, and even datasets across different species.
The challenge of pooling information across different datasets to maximize analytical power has been a really interesting methodological problem that we’ve focused on for the past few years. In 2013, researchers typically didn’t have many datasets to work with, but in the last few years especially, aggregating multiple datasets has been an extraordinarily pressing and crucial challenge for many biologists. My lab, along with the lab of John Marioni at the European Bioinformatics Institute, reported some of the first methods that could be used for dataset integration. Since then, an entire subfield of developing methods for single-cell data integration has taken off, with dozens of innovative and powerful algorithms and software packages being released. It’s been fun and rewarding to have been part of these advances from so many groups in the community.
Your lab will lead the new Center for Integrated Cellular Analysis. How does your work connect?
Satija: A couple years ago, as it became clear that single-cell transcriptomic analysis was having tremendous impact, we felt the next frontier was going to be to move beyond looking at gene expression in single cells.
RNA or gene expression is, of course, tremendously valuable and informative to measure in single cells. But the intense focus on single-cell RNA-seq has largely been because the technology is becoming more and more robust, and measuring RNA abundance is no longer particularly challenging. Yet there are many other aspects about single cells — such as their chromatin state, spatial location, lineage information, and protein expression — that are essential to measure. Doing so is not quite as straightforward as profiling RNA expression, but a vision of integrated single-cell multimodal omics was shared by a number of groups at the NYGC; it was also listed by Nature Methods as the Method of the Year for 2019.
My lab is particularly interested in designing computational methods to be able to handle these multi-omic datasets. We work closely with Peter Smibert, the director of the Technology Innovation Lab at the NYGC, and Dan Landau, another principal investigator associated with the NYGC.
All three of us were thinking about this problem in our own lanes and decided to join forces. There are actually nine labs involved in the center across New York City. And all of us are working towards this kind of vision, where instead of measuring just one thing — that is, just performing single-cell RNA sequencing — we want to be able to measure as much as we possibly can, all in the same cells and all at the same time. And, of course, that’s going to require technology development, as well as the development of some very exciting computational algorithms to make sense of all that data.
We were very fortunate this year to be selected as a Center for Excellence in Genomic Science (CEGS) by the National Human Genome Research Institute. The CEGS program funds interdisciplinary research teams with a common vision to advance genomic science. We hope that our center will develop technologies and algorithms that enable the broader community, and look forward to sharing them openly.
How has open science been important to your work?
Satija: One of the aspects I enjoy about working in methods development for single-cell genomics is that the application of our research is not specific to a single field like neuroscience or immunology, but can apply to a huge diversity of biological systems. And so many fields can benefit from new methods and technologies, but only if the science is truly open and accessible.
Seurat is downloaded upwards of 20,000 times a month. We receive a huge amount of user questions that require support and documentation, along with feature requests, bug fixes, and contributions from the community that don’t necessarily require new statistical methods, but involve being able to support users to apply the functionality that we already offer. For my lab and many others, we typically receive grants to develop new statistical methods — yet there has never been a dedicated funding source for community support and engagement.
I received funding from CZI’s Essential Open Science Software (EOSS) program for the development of Seurat. What’s been unique about our involvement with the EOSS program is that it provides resources for us to engage, support, and enable our users — even without necessarily adding functionality. That is a wonderful distinction, and has had real impact on our work, and for so many of the EOSS grantees.
Even before EOSS, I was also part of CZI’s first two grant programs for projects to develop pilot technologies and computational methods for the Human Cell Atlas. All the CZI programs that we’ve been involved in have emphasized the importance of open code, open data, and open science. They’ve repeatedly incentivized and rewarded groups that follow these principles, and it’s leading to a wonderful cultural shift that extends far beyond single-cell genomics.