Scaling Open Infrastructure and Reproducibility in Biomedicine
The maintenance of scientific open source software is unlike scientific papers or other traditional forms of research outputs. Scientific papers don’t incur additional costs or labor to the authors the more they are used or cited. In contrast, the most mature scientific tools and software libraries become increasingly more challenging to maintain the larger their user base.
“Code, while it’s being traded, appraised, or exchanged, assumes its static form, with all the properties that we’d expect of a commodity. But once it finds users, code springs to life, switching to an active state and incurring hidden costs.” — Nadia Eghbal, from Working in Public: The Making and Maintenance of Open Source Software
For open source maintainers, popularity and large-scale adoption typically means dealing with problems that have less to do with developing new functionality. Rather, their time is often dedicated more to onboarding new contributors; rapidly triaging incoming requests; creating safe and inclusive spaces for people to discuss a project’s roadmap; prioritizing security, scalability or continuous integration; and reducing technical debt. Technical debt or community management, however, are not activities that typical scientific grants support. This problem is true for research infrastructure at large.
The tools and infrastructure that provide the computational foundations of biomedical research urgently need support for these less glamorous — but nonetheless critical — tasks in order to continue to grow and deliver value to the scientific community. As a result, we launched the Essential Open Source Software for Science (EOSS) program in 2019 to provide some of the most widely-used scientific open source tools with much-needed support to address these traditionally underfunded activities. With the third funding cycle of this program, we are awarding $3 million through 17 new grants.
In the third cycle, we we are supporting several foundational open source projects, such as:
- Galaxy — a web-based computational workbench with a large scientific community and institutional adoption, including three national-level servers that support 250,000 users, to expand its support of biomedical data analysis.
- Read the Docs — an organization that aims to reduce the learning curve for open source software to make documentation for scientific Python packages more accessible and interoperable.
- Apache Arrow — a cross-language development platform for in-memory analytics with large adoption in genomics research to expand their apprenticeship program, which recruits developers from underrepresented groups and trains them to become skilled open-source software maintainers.
We are also funding a growing number of domain-specific tools that are the de facto standard in individual fields of biomedical research, such as Monocle and DESeq2 — two computational packages largely used in single-cell research. Additionally, we are extending our support to a number of projects previously funded in cycle 1. You can browse the full list of projects awarded in the third EOSS cycle.
These awards bring the total number of funded proposals to 67 projects, and CZI’s Open Science program’s total commitment to funding scientific open source through EOSS to $11.8 million. We are thrilled by the opportunity to collaborate with these maintainers and contributors of key software for science. We look forward to working with the EOSS community and our advisors to bring these maintainers additional visibility, sustainability, tools, and resources to grow a healthy and diverse community.
Our support of open science only begins with scientific open source projects. We’re also awarding three additional grants to organizations helping scale open infrastructure and reproducible practices: the International Interactive Computing Collaboration, Reproducibility for Everyone, and Invest in Open Infrastructure.
International Interactive Computing Collaboration (2i2c)
A growing number of biomedical researchers and institutions depend on the availability of cloud-based technology to perform large-scale data analysis. While providers of cloud computing infrastructure exist, there’s a significant gap in the availability of scalable cloud tools and technology designed primarily for researchers and deployable in academic institutions. The International Interactive Computing Collaboration (2i2c) is a new non-profit organization with a mission to promote interactive computing in research and education through support for Jupyter and other related open source technology. We’re supporting 2i2c’s mission with a $1.4 million, three-year grant. This grant will provide 2i2c with funding to secure the sustainability, value to academic researchers, and growth of Jupyter-based infrastructure for cloud computing. Funding will also support 2i2c’s first hire — Georgiana Dolocan — as an Open Source Infrastructure Engineer. Georgiana will support several 2i2c pilot hubs for community colleges, universities, and research institutions, helping these organizations accomplish their mission through 2i2c infrastructure. It will also support Chris Holdgraf to build strategic partnerships and collaborations, find opportunities for Jupyter infrastructure to benefit research and education, coordinate activity in the Jupyter project that benefits these communities, and secure more funding for development, maintenance, and support for Jupyter technology. Learn more about 2i2c’s mission.
Reproducibility for Everyone (R4E)
As more researchers aim to adopt open computational research practices, there is a strong demand to turn these principles from theory to practice, and make the associated skills more broadly accessible to researchers and trainees. In 2019, we partnered with The Carpentries to support their mission to develop a global computational training network and build a repository of high-quality computational lessons. We’re proud to announce a new grant to support the Reproducibility for Everyone (R4E) initiative. R4E runs one- to two-hour-long workshops to train life sciences researchers in reproducibility tools and best practices, and since its inception in 2017, has successfully trained thousands of life scientists. This $230K grant will enable R4E to hire April Clyburne-Sherin as its first full-time employee to focus on developing resources to create an annual circuit of core conferences. It will also help scale the organization via governance and equity planning, documentation creation for easier onboarding and automated reporting, and outreach to new volunteers, new conferences, and new communities.
Invest in Open Infrastructure (IOI)
As more and more open tools and infrastructure become essential to researchers, we need to develop sustainability models and resilient strategies to ensure their long term availability. The coronavirus pandemic has severely impacted academic institutions and their ability to support key infrastructure that is essential to the work of scientists, students, librarians and other academic staff. We are joining a growing number of funding organizations supporting Invest in Open Infrastructure (IOI) — a non-profit initiative dedicated to helping focus investments in the open technology on which research relies. This $80K grant will help IOI and its Executive Director, Kaitlin Thaney, to advance their mission by shedding light on challenges, conducting research, and working with funders and institutional decision makers to enact change.
There’s still a long way to achieve universal and immediate access to all research outputs and knowledge and to make this access equitable and sustainable. Learn more about our goals and current initiatives in CZI’s Open Science program.
We want to acknowledge several advisors and reviewers who, along with CZI staffers, helped us design the EOSS program or participated in the review process for this funding cycle: Alberto Bacchelli, Amy Bernard, C. Titus Brown, Abby Cabunoc Mayes, David Feng, Amel Ghouila, Allen Goodman, Mahmoud Hashemi, David Haussler, Peter Kharchenko, Julia Lowndes, Molly Maleckar, Debbie Marks, Marius Pachitariu, Karthik Ram, Stephan Saalfeld, and Yo Yehudi.
Dario Taraborelli, Science Program Officer, Open Science
Dario is a social computing researcher and an open knowledge advocate. As the Science Program Officer for Open Science at CZI, his goal is to build programs and technology to support open, reproducible, and accessible research. Prior to joining CZI, he served as the Director, Head of Research at the Wikimedia Foundation, the non-profit that operates Wikipedia and its sister projects. As a co-author of the Altmetrics Manifesto, a co-founder of the Initiative for Open Citations, and a long-standing open access advocate, he has been designing systems and programs to accelerate the discoverability and reuse of scientific knowledge by scholars, policy makers, and the general public alike.