Taking Bioinformatics to Heart: A Q&A with Katherine Pollard

Transforming bioinformatics into a guiding force in research

For a scientist destined to earn renown for her skill in creating biostatistical tools and extracting meaning from large datasets, Katherine Pollard’s initial impulses toward the study of math were somewhat conflicted: “I wasn’t sure I saw myself as a math major, until a professor convinced me to think about it,” she says of her first collegiate year.

By graduation, she had arrived at a happy medium between numbers and her interest in human physiology and evolution: a double major in mathematics and anthropology. “I guess I found people more interesting than equations,” she says, “but I figured out a way to study people by using equations, so it worked out okay.”

Indeed. Pollard, who has PhD in biostatistics, is now a Professor of Epidemiology and Biostatistics at the University of California, San Francisco, the Director of the Gladstone Institute of Data Science and Biotechnology, and a Chan Zuckerberg Biohub Investigator. She has already accumulated an impressive array of research achievements. These include contributions to cancer genetics, the compilation of the first genome for the chimpanzee, and genomic insights into human evolution and a variety of diseases. Pollard has also recently been elected into the College of Fellows of the American Institute for Medical and Biological Engineering (AIMBE).

The thread connecting these contributions is her focus on understanding how the genome is organized and regulated, and how it evolves. Recently, this has led Pollard to the study of the 4D nucleome — how DNA is packed and organized in the nucleus of a cell and how that organization changes over time.

Late last year, Pollard, with Gladstone Institutes colleague Benoit Bruneau, was awarded a 4D Nucleome grant from the National Institutes of Health. Under this five-year grant, the two researchers and their teams will further their investigations into the process of DNA folding in cells, and how mutations drive irregularities in this folding in the embryonic heart, ultimately contributing to heart disease.

In the following interview, Pollard discusses her evolution as a researcher, the new project on the developing heart, and her commitment to training women and other underrepresented groups for science careers.

What drew you to bioinformatics?

Pollard: Well, at the time bioinformatics didn’t yet exist as a field, so it’s not like I could have said I wanted to study bioinformatics, even if I grasped that that’s what I wanted to do.

If there was a moment when things kind of came together, it was when I was a postdoc. I joined a lab at the University of California, Santa Cruz that was the first to put all the pieces of the human genome together. And they built a visualization tool called the UCSC Genome Browser so that you could navigate around it, kind of like a Google Maps for the genome.

When I came to the lab, the principal investigator David Haussler said, ”We’re getting a chimpanzee genome in. Does anybody want to look at that?” And I realized that I’d spent a lot of my time as an undergrad studying chimps and other non-human primates and comparing them physiologically, behaviorally, and medically to humans using whole-organism comparative approaches. It was then that I realized I was in the perfect position to compare chimps and humans from a genetic basis — and then to try to link the genetic differences between our species to things like behavioral or neurological differences.

I seemed more suited than other people in the lab who didn’t have that background in evolution and anthropology, so it was a moment when I realized I’d gotten where I was supposed to go.

“Even though I took a bit of a winding path to get there, I was really excited: This comparison of human and chimp was what I’d been training my whole life to do. I just hadn’t realized it.”

Heart cells created from stem cells. Katherine Pollard and Benoit Bruneau are collaborating to study DNA folding and organization within the nucleus in human heart cells. Photo courtesy of Kim Cordes, Gladstone Institutes.

How did you go from applying your bioinformatics skills to the chimpanzee genome to studying a variety of human diseases?

Pollard: As a PhD student I actually worked a lot on cancer because that’s where the data were at the time. Later, as a postdoc, I was able to apply my bioinformatics skills in a new way. My goal was to use a comparative approach to understand how genomes work. It turned out that the problem of comparing a tumor to healthy tissue is mathematically similar to comparing the human genome to another species’ or comparing two humans whose genomes aren’t identical.

So, along with my colleagues, I began to investigate what parts of the human genome are most different between a human and chimp — or which parts are uniquely human. We found that most of the differences between chimps and humans are not in our proteins. The distinct DNA regions are mostly outside of protein-coding genes; they’re sequences that control gene expression, or when genes get turned on and off, but they aren’t the genes themselves.

These DNA regions are “enhancers”?

Pollard: Exactly. But we didn’t know that at the time. They were called “junk DNA”. My lab spent a decade or more studying how enhancers work so we could interpret the results obtained when I was a postdoc.

It turns out that many of these enhancers — these non-coding sequences — are adjacent to genes that are important in making the body’s plan. This got me interested in development in general, along with gene regulation, and in the development of diseases.

You’ve recently been awarded a grant to study DNA folding and organization within the nucleus in human heart cells. Why focus on the nuclear DNA and its role in congenital heart disease?

Pollard: If you compare a human to a chimp or a kid with a heart defect to a healthy person, the biggest genetic difference is actually missing chunks of DNA — big deletions, duplications of pieces, and rearrangements of chromosomes. So I’ve become interested in these larger structural changes in chromosomes and how they cause disease, as well as how they also cause evolutionary differences between humans from different ancestral populations and between humans and chimps — and in particular, how those cause the DNA to fold differently.

You may know that inside every teeny tiny cell in our body, there’s about two meters of DNA — it’s this long, super-skinny fiber and it has to fold up to fit in a cell. It doesn’t fold randomly, and the way it folds is very important to its function.

The idea is that the key to understanding how species differ, or how disease occurs, lies in the mutations that cause different folding in the genome. I’ve been interested in that in general, and we’ve now been funded to look at it in the heart.

Crammed into the nucleus of every cell is a 2-meter-long fiber of DNA. Understanding the three-dimensional organization of the nucleus in space and time and how it influences function is the goal of the 4D Nucleome project. This schematic shows the 3D folding of human chromosome 7. Image: Brant L., et al. Mol Syst Biol (2016) 12: 891.

And I gather this project also involves an AI model of genome folding that you are, in effect, training.

Pollard: Yes, we’re doing machine learning. The idea is to train the computer to learn the relationship between sequence and structure for chromosomes. Perhaps you saw the big news recently involving Deep Mind and how their AI is able predict protein structure from protein sequence. We’re doing the same problem, except we’re going from DNA sequence to DNA structure. And we’re also using deep learning.

Ideally, at the end of the five years, what will you have accomplished?

Pollard: We have a preliminary, fairly generic model that we’re trying to make more accurate and more specific to the heart. I expect that in the first year we’ll probably accomplish that.

And then, the goal — or the opportunity — is that when you have a computer model you can test a lot of hypotheses very rapidly without needing patients and heart tissue and such. You can actually use the computer and enter genetic mutations that you see in patients — including mutations that you’ve never seen before because they’re probably lethal, and you can see what predictions the model makes. That will help us screen very rapidly on the computer a lot of potential things that we might want to design experiments for and test in the lab. So using the computer model to design the experiments then leads to work in the lab, which iterates back and helps us improve the model and hopefully gets us much more rapidly to answers.

Beyond the 4D Nucleome grant, what other research aims or aspirations do you have?

Pollard: My lab is funded to solve this problem for heart development, but I would like to apply the same strategy to other diseases, other developmental transitions, and other comparisons that can be made. Another really cool thing about a model is you can feed in any genome, even one where you couldn’t measure the genome’s folding. You can use the model to make a prediction. All you need is the DNA sequence.

So, we have a new project with Tony Capra’s lab feeding the genomes of Neanderthal or other extinct hominins and into our model and seeing if we can predict differences in how their genome folds compared to a modern human. That’s cool because you could never do the experiment — you’re never going to have their cells and be able to see how the genome is folding. But the model can make a prediction. So, given that we can now predict human and chimp folding well, we should be able to make predictions about what’s actually happening in the cells of extinct ancestors of chimps and humans.

The geometry of the genome influences which genes are active and which ones are silent. This computer enhanced electron micrograph image shows interacting chains of chromatin, the DNA protein structure, in the cell nucleus. Image courtesy of Clodagh C. O’Shea, Salk Institute, La Jolla, Calif.

What else would you like to communicate about your work?

Pollard: I’ve spent most of my career at the end of experiments. People have come up with hypotheses, conducted experiments, and produced a lot of data. Then, they say, “Katie, we’ve got this huge mess of data — can you make any sense of it?”

It’s very reactive. It’s kind of downstream.

“What I’ve been discussing with you is a completely different strategy, in which the computer modeling is upstream, helping to design the experiments and helping us as scientists figure out where our blind spots are and what the next step might be. That’s a very different role for data science in biomedical research from what it normally is and from what I’ve been doing most of my career.”

Looking ahead even further: Can you anticipate how, at the end of your career, you might finish the phrase, “What I’ll always be proudest of is…”

Pollard: I think I’ll be most proud of the people I trained, and their successes and their careers, which will live on after I’m gone, and especially if I’ve been successful in training more women or other underrepresented groups to be rock-star scientists.

Written by Christopher King



Chan Zuckerberg Initiative Science

Supporting the science and technology that will make it possible to cure, prevent, or manage all diseases by the end of the century.