What We Learned by Listening to Computational Biologists
On the CZI Science user experience research team, we use qualitative and quantitative research methods to understand scientists’ problems and identify human-centered solutions. Sometimes, our research is closely tied to a particular product or project, while other times, we ask broader questions.
In a recent project, we wanted to learn more about an entire field of science — computational biology. Within CZI, our own computational biology team plays a key role collaborating hands-on with scientists, exploring new opportunities, and building software with and for the scientific community. Across several of our projects, like the Human Cell Atlas Data Coordination Platform, cellxgene, and Starfish, we are helping build tools that enable both experimental and computational biologists to analyze, visualize, and share their data.
To make sure we’re building these tools in the right way, we wanted to learn more about how computational biologists are doing their work. In particular, we wanted to learn about the state of the field: who is participating, what their backgrounds are, and how they work. We also wanted to understand what parts of the computational biologists’ workflow are the most challenging, as those are promising focus areas for future CZI products or services.
To answer all these questions, we first visited and interviewed 12 computational biologists from around the world. These interviews included a tour of their lab space and an in-depth conversation about their backgrounds, career trajectories, and workflows. Based on the insights we gained, we followed up with a larger survey to see how our findings generalized to a broader group. More than 200 computational biologist participants, from a variety of biological sub-disciplines, completed the survey. Here are our top five takeaways:
1. “Computational biologist” can mean a lot of things
In our interviews and survey, people with identical job descriptions described themselves interchangeably as “bioinformaticians” and “computational biologists”. At the same time, those who described themselves as “computational biologists” sometimes had very different roles. For example, some computational biologists described their primary role as doing all of the analyses for a group of experimental collaborators, while others said they were more removed from the wet lab or field biology work and spent their time developing methods with other computational researchers.
Along with this diversity of job duties, participants took a wide range of paths to become computational biologists. In our survey, participants were roughly evenly split between three career paths: starting off in biology and picking up computational methods along the way (about 29% of respondents); starting off in a more computational area (like computer science, mathematics, physics, or statistics) and then applying their skills to biology (about 25% of respondents); or directly studying computational biology or bioinformatics (about 22% of respondents).
Interestingly, a larger portion of principal investigators (PIs) in our survey had started off on the computational side and had become more biology-focused, while a larger portion of trainees (graduate students and postdoctoral fellows) either started in biology or specifically studied computational biology. This may reflect the overall trend in need for this role, with computational biology shifting from a niche role that could be filled by someone from another discipline to an integral part of biological research.
2. Many computational biologists are self-taught, and many have knowledge gaps
As noted, many computational biologists (56% in our survey) might come to computational biology from another subdiscipline, such as experimental biology, computer science, math, physics, or chemistry. In making this transition, many computational biologists reported teaching themselves a new set of skills. For instance, 22% of our survey participants said they were completely self-taught when it comes to writing code, while another 26% said their coding education was limited to a few classes, workshops, or online tutorials. More than 40% of those surveyed said they had never gathered biological data themselves.
“There’s a lot of information if you want to get into R and Python, but if you want to go a little further… I haven’t found the sources… It would be useful if someone with more software design experience could look over my shoulder and tell me what I could do more efficiently.”
In interviews, participants described some of the challenges of being self-taught. Several of the computational biologists said they felt they had gaps in their coding knowledge, and it was hard for them to “level up” and code more efficiently without formal training or mentorship. This was also reflected in our survey, where self-taught coders reported knowing fewer programming languages than those with more formal training, and were less likely to use helpful tools like Github.
3. Collaboration is crucial to being a computational biologist
A large majority (88%) of our survey participants said they’re involved in collaborations outside their own labs at any given time, and on average, have three collaborative projects at a time. However, most participants said the majority of their collaborations were within their own institution.
Overall, most computational biologists in our survey said that finding collaborators was relatively easy, though graduate students and postdocs found it more difficult than PIs. However, in interviews, several people noted that it can be hard to find good collaborators outside of a department or research area. Some of our interview participants mentioned finding collaborators through chance, such as meeting at a party or a child’s school!
4. There are standard tools of the trade
Another set of tools that computational biologists use are those for sending datasets back and forth (as part of all of those collaborations!). We were surprised to find that although most survey participants said they shared data with collaborators frequently — at least once a month for most respondents — few of them used specialized tools to do so. The most common solutions are using file storage software like Dropbox or Google Drive, or sending data as an email attachment. Only a small number used solutions optimized for sharing large amounts of data; for example, cloud storage services like Google Cloud or Amazon Web Services.
5. Open data and data organization are challenging!
In both the survey and the interviews, we asked participants what aspects of their work they found the most challenging. While we heard a lot of variability in peoples’ responses, many participants in the survey cited two main areas as the biggest challenges: finding high-quality open data to work with and organizing their own data.
Open data is crucial to many computational biologists’ work, with more than 75% saying they incorporate open data into their work monthly or more often. However, 28% of survey respondents also said finding open data was the most challenging task for them in their work, because the data they need is often disorganized, poorly curated, and hard to use.
Respondents expressed frustration about the organization of other people’s data, and they also struggled with organizing their own data. Another large group of respondents (32%) said that organizing and versioning their data was their hardest task. Because of the sheer quantity of data researchers are working with, keeping that data organized with multiple collaborators contributing and multiple versions of code is complex!
At a high level, this feedback supports our involvement in open data sharing projects like the Human Cell Atlas Data Coordination Platform, and suggests some ways we could get even more involved assisting computational biologists in the future — perhaps by helping them organize and version their data. At a more tactical level, as we continue building tools for computational biologists, we are now even more confident in how highly we prioritize ensuring these tools work in R and Python and run on Mac OS.
This research also gives us a whole new set of problems to think about, and some places where we could have an impact on this community in the future. For example, we learned that self-taught computational biologists feel that they lack opportunities to improve their coding skills, and that these individuals are least comfortable with helpful tools like Github. We also learned that many computational biologists find it difficult to find collaborators outside of their department or research area. While we can’t solve these problems alone, these are the types of issues that we’d like to think about in future grants and programs.
Computational biology is a relatively new field, and it has grown rapidly in the past decade. We know that this research provides just a snapshot of how people are thinking about this role right now, so we hope that we can repeat this research in the coming years to understand how the field is changing.
To learn more about our work in science and to stay updated on funding opportunities, visit our website, where you can sign up for our mailing list. You can also follow us on Twitter. To learn more about our science team, follow the CZI science blog. And you can always reach us at email@example.com.
Adrienne Sussman, UX Researcher, Chan Zuckerberg Initiative
Adrienne works with scientists to build tools and platforms that enable open science. A former biologist, she transitioned into experimental psychology during her PhD at the University of Washington. After grad school, she spent several years as a user experience researcher at Google, where she worked with diverse users — from first-time smartphone users in Delhi to comic book readers in Tokyo.