How Open Source Software Contributors Are Accelerating Biomedicine

Hear from 9 People Working on Key Open Source Software Projects for Science

Each day, hundreds of thousands of scientists use open source software to advance biology and medicine, from studying cells in a microscope image to understanding how genes behave in healthy cells. Open source software underpins much of modern scientific research — providing reproducibility, transparency, and opportunities for collaboration. The impact of these tools is on par with some of the most cited papers in science in terms of reuse and adoption, yet even the most widely-used research software often lacks dedicated funding.

Our Essential Open Source Software for Science (EOSS) program was created to support these efforts — from software maintenance to growth, development, and community engagement for open source tools that are critical to science. We asked nine grantees from the first cycle of the EOSS program what drives them to create tools and how their commitment to open source moves science forward.

Olga Vitek, MSstats and Cardinal

Image for post
Image for post
Olga Vitek of Northeastern University works on the open source software tools MSstats and Cardinal.

“As a graduate student, I benefited enormously from open source software, both as a learning experience and as a tool that enhanced my own research. Back then I made a commitment to pay it forward.” — Olga Vitek, Northeastern University.

MSstats and Cardinal provide statistical software for mass spectrometry-based experiments, which allow scientists to measure the weight of different molecules in a sample.

Open source tools provide scientists with limited resources and democratize access to complex and expensive technology. Open-source tools invite critical evaluation along with positive feedback, improvements, and contributions. They also allow young scientists to learn. This discourse is key for scientific progress and for verifying the reproducibility and reliability of experimental results.

We hope that these projects will provide broad statistical support to researchers who work in a continuously evolving biotechnological landscape in a way that is stable, reliable, and robust.

Kevin Eliceiri, MicroManager and ImageJ

Image for post
Image for post
Kevin Eliceiri of the University of Wisconsin, Madison, aligns a laser-scanning microscope controllable by the open source software tool μManager. Lab-built microscopes like this one help researchers answer important scientific questions, such as how cancer progresses.

μManager, a software package for control of automated microscopes, allows the end user to customize microscopy setups for a wide range of biological studies. Together with the image processing application ImageJ, μManager provides a comprehensive and freely available imaging solution. We are working on improving μManager’s architecture, infrastructure, and support to ensure many years of growth in user base and capabilities.

“I’ve always believed that science is best done by building on the work of others and openly sharing what you have done. Open source software not only saves time and resources, but can directly lead to new innovation and discovery.” — Kevin Eliceiri, University of Wisconsin, Madison.

Open source software is all about accessibility and transparency, allowing scientists to try new approaches and understand precisely what was done. Open source software enables unhindered adoption and makes it possible to take tools into new directions beyond their original intent.

Mark Musen, Protege

Image for post
Image for post
The Protégé team, which includes Mark Musen of Stanford University, regularly holds short courses to train users in the technology.

Dozens of clinical specialists around the world used WebProtégé to craft the latest version of the International Classification on Diseases (ICD), which provides a comprehensive list of potential health problems and causes of illness. The World Health Organization and most countries worldwide use this classification system to track public health. In the United States, ICD plays an essential role in communicating patient conditions throughout the healthcare system.

“It’s very exciting for us to be developing software that has achieved such widespread use and that is helping to advance science in so many ways.” — Mark Musen, Stanford University.

Thousands of scientists use our Protégé software. Many of them contribute back to our work by creating plug-ins that add new functionality to the system that we would never have dreamed of developing ourselves. We’re working to construct the next generation of Protégé using a modern web stack that will make the software easier to maintain and extend, and make it easier for third parties to contribute to the code base.

Image for post
Image for post
Image for post
Image for post
DeepLabCut allows researchers to track and label the body parts of moving animals such as this cheetah.

Mackenzie Mathis, DeepLabCut

DeepLabCut allows researchers to automatically track and label the body parts of moving animals to better understand their behavior. Neuroscientists Mackenzie Mathis and Alexander Mathis developed the tool to automate the time-consuming process of labeling hundreds of frames and to increase the accuracy of labeling through deep learning.

“We wanted to create a tool that was easy to use without computer vision expertise. We hope to continue to enable scientists to do research in more real-world contexts.” — Mackenzie Mathis, Harvard University and the Swiss Federal Institute of Technology Lausanne.

Several areas of research, including neuroscience, medicine, and biomechanics, use data from tracking movement. Scientists are already using DeepLabCut to study octopuses, electric fish, and even the movements of robots that assist doctors in performing surgery.

Antonin Delpeuch, OpenRefine

Image for post
Image for post
Antonin Delpeuch of OpenRefine and Code for Science and Society.

Antonin Delpeuch works on OpenRefine, a tool for analyzing messy and large datasets. OpenRefine helps researchers quickly identify and fix issues in spreadsheet or tabular data. Its automated functions handle problems such as splitting cells that contain multiple data values, detecting duplicates, standardizing date formats, trimming extra spaces from cells, and combining multiple datasets into a single spreadsheet.

The UK Software Sustainability Institute’s motto sums it all up: “Better software, better research.”

“Research software has to be open source if we want to take reproducibility seriously.” — Antonin Delpeuch, Code for Science and Society.

I find it hard to imagine working on software in any other way! The initial spark came when my biology teacher introduced me to Linux; I was about 12 and I have been learning since then.

I like how it questions the idea of ownership. The easier it is for someone else to step in and take responsibility, the more resilient the project becomes. It really changes the way you think about your own work, and I would like to see this happen more often in science, too.

Hannes Rost, OpenMS

Image for post
Image for post
OpenMS analyzes thousands of data slices like the one depicted here to piece together how human proteins affect health and disease.

“We are working on a technology that allows us to perform thousands of clinical tests in less than one hour — for a fraction of the cost.” — Hannes Rost, University of Toronto.

We use mass spectrometry, which can detect hundreds to thousands of analytes, or unknown substances, directly from human samples, such as human blood. One of the challenges in mass spectrometry is the complexity of the data; our software tools sift through the data, separate signal from noise and report reliable quantitative values, visualize the results, and present the user with interpretable data.

Our software is advancing scientists’ ability to detect proteins in multiple dangerous pathogens such as Streptococcus pyogenes (which can cause strep throat) and Mycobacterium tuberculosis (which causes tuberculosis). OpenMS is also used to reliably measure cancer samples. These measurements have led to a better understanding of how pathogens protect themselves when human immune systems try to fight them off, and how cancer tissue differs from normal tissue.

A better understanding of human disease will hopefully also lead to the identification of novel drugs targeting proteins that are involved in the disease process, potentially improving the quality of life for patients or curing diseases.

Emmanuelle Gouillart, scikit-image and Dash

Image for post
Image for post
Emmanuelle Gouillart of Plotly Technologies, Inc.

Imaging of molecules, cells and tissues is central to biomedical research and clinical practice, allowing scientists to understand and identify disease. A single image of a tissue sample can contain millions of different cell types, and scientists need faster, better ways to analyze that data. Emmanuelle Gouillart develops tools to do just that.

We are building a suite of advanced scientific tools that combines our data science analytics technology, called Dash, with scikit-image — the most popular image processing library for science. Our image processing applications help extend scikit-image’s use to non-programmers and ultimately help close the gap between programmers and experts in health-specific image processing programs.

“I love empowering users with great tools that are not only designed brick-by-brick by large and diverse communities, but are also freely available for anyone to use and improve upon.” — Emmanuelle Gouillart, Plotly Technologies, Inc.

Open source software enables countless people with different perspectives and areas of expertise to work together towards a shared goal. The collaboration that open source facilitates means that users are constantly exposed to completely new, sometimes crazy, always enlightening ideas from all over the world. Scientific advancement is critically dependent on that kind of collaboration.

Gordon Smyth, limma, edgeR and Glimma

Image for post
Image for post
Developer Gordon Smyth, with Charity Law and Yunshun Chen stand by an image of chromosomal interactions detected with tools edgeR and diffHic. Interactions like these pinpoint how the 3D structure of DNA allows genes to be turned on and off as needed.

Gordon Smyth creates software tools that allow scientists to interpret genomic data in powerful and flexible ways, helping to make biomedical discoveries. His bioinformatics lab collaborates with other researchers at the Walter and Eliza Hall Institute in Melbourne, Australia, to understand the biology and treatment of breast cancer. A key success that depended on their software was the discovery of the cell of origin for the most invasive form of breast cancer.

Modern genomic technologies produce huge amounts of data that allow us to examine which genes are switched on and how active they are in any type of cell at any time. We develop advanced methods and software to interpret this information. In our research, we try to understand how genes behave in healthy cells and what changes when things go wrong, leading to possible treatments for human diseases.

Open source software is an essential tool for reproducible research. Whenever we publish biomedical discoveries, we are able to make available at the same time our bioinformatics code and software packages, allowing other researchers to repeat and validate our analyses.

Greg Caporaso, QIIME 2

Image for post
Image for post
Greg Caporaso of Northern Arizona University (NAU). Photo courtesy of NAU.

The QIIME 2 software has already helped researchers discover a potential new avenue of treatment for autism, which will be explored in further clinical trials. Thanks to a grant from CZI, the community is hosting its first-ever co-convened user and developer workshop and networking event.

In 2013, my lab joined a project with researchers studying autism at Arizona State University, who were intrigued by studies that suggested the gut microbiome differed between children with autism and children without it. We performed an early stage clinical trial to test whether altering the gut microbiome of children with autism using a fecal microbiota transplant could sustainably change the composition of their gut microbiome and reduce the severity of their symptoms.

As part of this project, my lab developed novel algorithms to track engraftment, or the time period when transplanted cells start to make new blood cells, of a donor microbiome into a recipient. After treatment, the microbiomes of the children with autism resembled those of the children without autism more than they did at the beginning of the study.

We observed no adverse effects, and while this early stage trial was designed primarily to evaluate the safety of the treatment, we saw that many of the children experienced a reduction in the severity of their symptoms during the following two years.

Image for post
Image for post

Check out the full list of open source projects we’re supporting and apply for funding via the second cycle of our EOSS request for applications (closing February 4, 2020).

Written by

Supporting the science and technology that will make it possible to cure, prevent, or manage all diseases by the end of the century.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store