The Invisible Foundations of Biomedicine

Supporting 42 Essential Open Source Software Projects that Drive Science

Earlier this year, the world witnessed a major scientific breakthrough — the first image ever produced of a black hole. This fiery doughnut captured our collective imagination. As the photo circulated the planet, scientists, policy makers, and the public alike celebrated this incredible achievement.

A team of almost 350 astronomers, engineers, and data scientists worked for years using raw data from powerful telescopes and advanced imaging software to reconstruct the stunning image. What the image alone doesn’t convey is how much more invisible labor went into its creation. As Dr. Katie Bouman, one of the team leads, noted in her keynote address at the Swiss Federal Institute of Technology in Lausanne, more than 20,000 people participated in the creation of the open source software that made this breakthrough discovery possible.

One of the projects Dr. Bouman and her team used — Matplotlib, a Python library for 2D plotting and a sponsored project of NumFOCUS — is emblematic of the challenges faced by many open source software packages that are essential to modern science. The project is named as a dependency by over 140,000 other code repositories. If software dependencies counted as citations, Matplotlib and its contributors would be considered one of the most impactful projects in the history of science. Yet the project’s lead developer is currently supported for only four to eight hours a week by their employer, and the project has not received any dedicated funding for maintenance — until now.

At the Chan Zuckerberg Initiative, we believe open resources that support collaboration and reuse are critical to accelerate progress in biomedicine. We also believe in the importance of rewarding the creators of invisible but crucial computational infrastructure for reproducible research.

For these reasons, we’re proud to support the computational foundations of biology through our Essential Open Source Software for Science program. To start, we’re providing 32 grants to 42 open source projects, including Matplotlib. The list of selected projects includes common software tools for reproducible research and data analysis, as well as domain-specific libraries used by thousands of researchers in specific fields of biomedicine. Learn more about the grantees.

The day after the SciPy 2019 conference, maintainers and new contributors to open source projects came together for a sprint. Photo provided by Ralf Gommers.

Since we announced this program to support maintenance of open source software for science, we received hundreds of applications from tools used in different areas of biomedicine, from single cell biology and genomics to imaging and microscopy, as well as foundational software. Through an open RFA, we received 293 proposal submissions from a total of 475 open source projects.

The vast majority of these projects (90%) are hosted on GitHub. We saw a strong representation from the Python and R community, as the languages of choice for scientific computing, as well as many other programming languages. The majority of these applications were hosted at academic institutions (65%), but we also received a sizable percentage of applications from industry and non-profits.

Taken together, these 475 projects paint a complex picture of the successes and struggles that scientific open source software maintainers face on a regular basis — in particular as they reach maturity and get adopted by large numbers of researchers. We learned a lot about these projects through the application process, and we plan to conduct additional research on the pain points and challenges they face, with the goal of sharing publicly with the community what we’ve learned.

As CZI aims to advance progress in biomedicine, we feel a responsibility to invest not only in promising new prototypes that may lead to novel research and applications, but also to strengthen and consolidate open source computational tools used on a daily basis by hundreds of thousands of researchers, recognize their value, and think about sustainable models to support their maintenance.

While we’re aware that CZI’s open source program doesn’t represent the solution to software sustainability in general, our hope is that these grants will help stabilize critical projects and give them some much needed resources for maintenance tasks that rarely get funded — from improving documentation and usability, to supporting core maintenance and addressing technical debt, to supporting outreach and convening users and contributors.

We want to thank everyone who applied for funding under this program. We learned a great deal about the work many open source projects are doing and will use this knowledge to craft future funding opportunities.

As we celebrate the first cohort of grantees, we’re also announcing the second application cycle. If you missed the opportunity to participate in the first round and you think your project fits with the goals of this program, please consider applying. Applications for the second cycle open on December 16, 2019, and close February 4, 2020.

We want to acknowledge several advisors and reviewers who helped us design the program or participated in the review process, along with CZI staffers:

Mara Averick, Alberto Bacchelli, Amy Bernard, Titus Brown, Anne Carpenter, Scott Chamberlain, David Feng, Martin Fenner, Amel Ghouila, Josh Greenberg, Mahmoud Hashemi, David Haussler, Stephanie Hicks, James Howison, Daniel S. Katz, Mike Keiser, Peter Kharchenko, Molly Maleckar, Debbie Marks, Abby Cabunoc Mayes, Chris Mentzel, Marius Pachitariu, Stephan Preibisch, Jason Priem, Karthik Ram, Danielle Robinson, Matt Rocklin, Stephan Saalfeld, Leah Silen, Arfon Smith, Tracy Teal, Nelle Varoquaux, Luis Villa, Andra Waagmeester, Kirstie Whitaker, and Carol Willing.

To learn more about our work at CZI, visit our website or follow us on Twitter.

Nicholas Sofroniew, Computational Biologist
Nicholas is a computational biologist at CZI, helping to develop, support, and disseminate tools that will accelerate science. He studied mathematics at the University of Cambridge, followed by a PhD and Post-Doc at HHMI’s Janelia Research Campus working in systems neuroscience, microscopy, and image analysis. At CZI, he focuses on projects in neurodegeneration, imaging, and open science.

Dario Taraborelli, Science Program Officer, Open Science
Dario is a social computing researcher and an open knowledge advocate. As the Science Program Officer for Open Science at CZI, his goal is to build programs and technology to support open, reproducible, and accessible research. Prior to joining CZI, he served as the Director, Head of Research at the Wikimedia Foundation, the non-profit that operates Wikipedia and its sister projects. As a co-author of the Altmetrics Manifesto, a co-founder of the Initiative for Open Citations, and a long-standing open access advocate, he has been designing systems and programs to accelerate the discoverability and reuse of scientific knowledge by scholars, policy makers, and the general public alike.

--

--

Chan Zuckerberg Initiative Science

Supporting the science and technology that will make it possible to cure, prevent, or manage all diseases by the end of the century.