Computational Biology at CZI
How we work with and for the scientific community
Biology has entered a data-driven era. Our ability to collect data has begun to outpace our ability to analyze and understand it. This shift poses enormous challenges for computation — and exciting new ways for philanthropy to help support it.
Through advances in imaging, sequencing, and physiology we can: capture a movie tracking all cells through a developing embryo; assemble multimodal molecular profiles of millions of single cells; or record the firing activity of thousands of neurons across multiple brain areas. Even simple algorithms to preprocess data — extracting the signal from the noise — now need to run at massive scale and high throughput. Entirely new analyses are needed to interpret what are, increasingly, probabilistic measurements. It’s not, “this one cell has this one function.” Rather, it’s, “this group of cells traverses a continuous range of characteristics that collectively define its function.”
Making sense of this complexity requires a coordinated computational ecosystem — algorithm developers, theorists, software tools, and shared infrastructure all working together. Building and supporting that ecosystem is becoming a central challenge.
Accelerating science through collaboration
At the Chan Zuckerberg Initiative (CZI), we aim to accelerate science. Our primary levers are to fund and build: we give grants to experimental and computational scientists and software developers, and we build software with and for the scientific community. But how do we understand the computational bottlenecks that scientists face on the ground? How do we catalyze coordination within a software or scientific ecosystem? How do we ensure we’re not duplicating existing efforts?
Our computational biology team plays a key role in CZI’s unique blend of funding and building transformative technology. Computational biology is emerging as a third pillar that glues together CZI’s grantmaking and software engineering, providing continuity in collaborations — both within our organization, and with external scientists.
Internally, we help translate across the wide diversity of perspectives that CZI brings under one roof: scientists, program managers, software engineers, product designers, and UX researchers. We know enough about biology to ground discussions in domain expertise, but also know enough about software to explain why an API design is confusing or to highlight limitations of a data format. This unique blend of expertise helps glue our organization together, and everyone is learning in the process.
At the same time, we collaborate closely with external scientists — frequently, but not exclusively, those funded through our grant programs — to explore challenges and help identify solutions. The nature of the work, and the solution, varies by domain. If a group of algorithm developers are struggling to assemble common benchmarks for machine learning, we might help curate datasets. If a group of labs are all running similar but slightly different computational pipelines and can’t quite reproduce each others’ results, we might help unify those pipelines. If a single lab is struggling to analyze a particularly complex multimodal dataset, we’ll get hands-on and help them find a solution — and then figure out if the solution generalizes. In these efforts, we are motivated not to get academic credit, but to support the scientific ecosystem.
We bring these experiences back to the entire CZI science team to incubate new ideas. These ideas might lead to working with our software engineers and product team to turn an early-stage prototype into a robust, reusable tool. They might also mean partnering with our grantmaking team to frame a new program that will help support or advance an area of computational science or open-source software. Or, we might learn through our explorations that the community is solving the problem on its own, and we can move onto the next idea.
Collaboration and consensus on image analysis
To better explain our approach, I want to highlight a project underway that exemplifies computational biology at CZI. The project deals with an exciting, emerging technology called “image-based transcriptomics” for measuring spatially resolved gene expression profiles within cells and across tissues. These measurements can be linked to other measures of cellular structure and function to obtain a better understanding of cellular biology in both health and disease. But significant technical challenges remain around storing, analyzing, sharing, and comparing the imaging data obtained from these methods.
A particular challenge is that many variants of this technique are in development, and it is hard to compare them, given the diversity in both the experimental methods and the data analysis. Funded by grants from CZI, a grassroots effort called SpaceTx formed among developers of many of the leading techniques. Each group is applying their method to common samples from partitioned human brain tissue, in order to generate a publicly available reference dataset for comparative analysis and benchmarking. This consortia is coordinated by the Allen Institute for Brain Science, under the umbrella of the Human Cell Atlas project, and data collection is currently underway.
Early on, Deep Ganguli on CZI’s computational biology team became interested in the analysis of these data. So he started collaborating. He visited labs, talked to grad students and post-docs, and looked at their data and code with them. On an early visit, a student mentioned how unusual — and wonderful — it was that someone else was actually interested in her code. By working closely with these groups, Deep realized that, while each lab was processing their data differently, there were commonalities, and it might be possible to express all the methods through a single unified pipeline. That idea gave rise to a project called Starfish (the name is a play on the fact that the “star”, or * symbol, is a wildcard, and nearly all the various experimental methods are some form of FISH, e.g. MERFISH, seqFISH, smFISH, osmFISH, etc.).
Starfish is now a collaborative team that brings together computational biologists and software engineers at CZI, and a variety of outside contributors. They have worked with the SpaceTx assay developers to arrive at a consensus file format for the raw input data and the analysis results. They have begun building a software library to solve the image processing problems required to analyze the data: registration, image filtering, spot detection, decoding, cell segmentation, and quality control. So far, there’s promise that one standardized software tool can work across several assays, instead of a series of custom packages. By building consensus across the community and defining open standards, the Starfish team ensures that their tool is not just solving one lab’s problem. By collaborating with a diversity of both users and developers, the team is helping make these complex techniques more broadly accessible.
There are many more projects that we’re working on or exploring. We’re deeply committed to open science and open-source — the source code for Starfish was on Github the day the project began — so it should be easy to follow along with our work. I’m super excited about the future of computational biology: multimodal datasets, advances in machine learning, closed-loop experimental design, and much more. I hope our team, and our organization, can help contribute.
To learn more about our work in science and to stay updated on funding opportunities, visit our website, where you can sign up for our mailing list. You can also follow us on Twitter. To learn more about our science team, follow the CZI science blog. And you can always reach us at email@example.com.
Jeremy Freeman, Director, Computational Biology
Jeremy is a scientist at the intersection of biology and technology. He wants to understand how biological systems work, and use that understanding to benefit both human health and the design of intelligent systems. He studied computational vision in grad school at NYU, led a neuroscience research lab at HHMI’s Janelia Research Campus, and is currently at the Chan Zuckerberg Initiative leading CZI’s work at the intersection of computation and biology. He is passionate about open source and open science, and bringing scientists and engineers together across a range of fields.