Open Data is Key to the Future of Medicine

New Chair of CZI Science Advisory Board Weighs in on How Data Sharing Accelerates Science

A doctor wants to better understand cardiovascular disease, the number one cause of death worldwide, and so, she focuses on her patients with heritable cardiomyopathies. She begins by collecting their echocardiograms, ultrasound-based images of hearts, then searches for data from teams in other places studying this same problem to compare to hers.

This task will probably be challenging (if not impossible) for the hypothetical clinician-scientist, says Molly Maleckar, PhD. The barriers that keep scientists from sharing their data are often many — and often challenging to overcome. Maleckar knows firsthand, having worked for years to tear them down with the aim of furthering biomedicine.

She imagines a future in which artificial intelligence fed by open data sources help us understand, diagnose, and treat disease early. Doctors, equipped with these new tools that ease their burden, have more time to spend with their patients and more time to ask the right questions. The patients themselves have more access to their data, a better understanding of their problems, and a greater ability to take control of their healthcare outcomes.

Data sharing accelerates us toward this reality, says Maleckar, who chairs the Chan Zuckerberg Initiative’s scientific advisory board.

“Open data is something that CZI, with its expertise in technology and its commitment to open science, is uniquely placed to understand,” she says. “CZI is stepping into this space and doing hard but critical work that receives little in the way of federal funding.”

Molly Maleckar, PhD, is chair of the Chan Zuckerberg Initiative’s scientific advisory board. Photo courtesy of Molly Maleckar.

The CZI-developed cellxgene, a tool that allows scientists to visualize large single-cell datasets generated from millions of individual measurements, is one example of this work. This platform now hosts data from projects that range from the RNA profiles of neural cells in the lab mouse, to profiles of nearly a million human cells gathered from two dozen organs in the body, to a new cell atlas of the mammalian brain motor cortex could provide key insight into how to treat brain diseases like ALS. The goal is to put immense amounts of rich scientific knowledge at the fingertips of any scientist or clinician with a question that they would like to explore.

The only scientist in a family of artists and journalists, Maleckar has long had an interest in systems problems that can’t be solved at a single scale of complexity. At the Allen Institute for Cell Science, where she was director of mathematical modeling, Maleckar led efforts to understand the nature of cells, from their component genes and proteins to their agglomeration in tissues and organisms. Her current work on cardiovascular diseases at Norway’s Simula Research Laboratory (and the affiliated ProCardio) requires bringing together diverse groups: clinicians, patients, industry imaging technologists, electrophysiologists, AI experts, data scientists, and scientists studying biophysical simulations.

“I’m an interdisciplinary, international scientist who enjoys uniting people coming from different scientific and cultural backgrounds to work on a problem that can be solved only by understanding its components in concert,” she says.

Scientist Molly Maleckar works to bring together diverse groups to solve scientific problems. Photo courtesy of Molly Maleckar.

For Maleckar, the data sharing problem is also multidimensional. There’s the sociological aspect: the hesitancy to share sometimes displayed by people working in publish-or-perish environments, for example. That’s where a culture shift is needed, in line with the open science movement building momentum. She also advocates for rethinking government policies and legal regulations that too often serve to unintentionally stymie progress.

There are also technical problems to address. Even if everyone backed the idea of anonymized public research data tomorrow, the repositories needed to house it don’t exist.

“We need to create pipelines for distributing data and mechanisms that ensure fair democratic access for diverse communities across the globe to make use of that data,” says Maleckar. “This opens the door for new ideas and new innovations.”

Consider recent projects Maleckar worked on with Norwegian colleagues in Denmark, a country she highlights as having made significant progress in terms of medical data repositories. Her team trained machine learning models with thousands of electrocardiograms (ECGs) collected carefully, with privacy safeguards in place, from patients. One model can be used to generate tens of thousands of artificial ECGs that can easily be used to further train artificial intelligences. Another model lent insight to ECG analysis, explaining the decisions of deep neural networks. This AI revealed new features in the data, and can help us determine whether someone is at risk for heart disease.

Maleckar is excited to see what else could be possible with more data, freely shared.

When researchers share data, methods, software, and new discoveries in a timely and effective manner, other scientists can quickly learn and build off of their efforts, leading to scientific breakthroughs faster. CZI’s Open Science program empowers more people to engage in research practices that accelerate the pace, robustness, and reproducibility of science. It supports grantees and the broader scientific community to deposit software code to open repositories, make experimental protocols openly accessible, and submit results to preprint servers to communicate results more quickly.

“Everyone’s talking about accessing data,” says Maleckar. “CZI is leading the pack.”