When it comes to cancer, too much data is not enough

When it comes to modern biomedical research, new experimental methods yield vast amounts of data. For example, a standard experiment might simultaneously determine millions of DNA markers across the entire genome. While this sounds like a lot of data, for Guoqiang Yu, it isn't nearly enough.

Guoqiang Yu

Guoqiang Yu

Yu joined the ECE faculty as an assistant professor in August 2012. A member of the Computational Bioinformatics and Bio-imaging Laboratory (CBIL), he seeks the genetic causes of diseases from breast cancer to drug abuse. By itself, each gene has a very minor effect, but for many diseases, becomes significant when it interacts with certain combinations of other genes. Finding these critical interactions will help guide researchers to new diagnostic tests and treatments.

Because of the number of ways genes can act on each other, the amount of data required to find these effects expands rapidly. When you add external environmental factors, the data load expands even more. Also, there are too many potential interactions and dependencies to look at just a small portion of the problem at once.

Yu is using this expanded data set — and more — in his research. He combines information from The Cancer Genome Atlas (TCGA), SNP genotyping, behavioral assessments, and neuroimaging, among other sources.

"I have been working on the complex interactions between genes and environmental factors," Yu explains. "Now I want to extend the work to complex interactions between different phenotypes," or observable characteristics. Yu is taking a systems approach to explore the big picture formed by all this data, and he is combining machine learning, signal processing, mathematics, and computational bioinformatics.

Model of algorithm

A visual representation of one of Yu's algorithms.

One of Yu's projects is developing a method to sort through interactions and determine which ones are important. He hunts for interactions that might be important as well as for places that might be missing important interactions. "It is very challenging to distill from complicated and noisy data definitive underlying rules," Yu says, but he believes a comprehensive view of the interactions will help.

Another project has sent Yu searching for genetic modifiers associated with Duchenne muscular dystrophy. He hopes that looking into the complex interplay between multiple phenotypic manifestations will better reveal the modifiers.

"My research is very rewarding," says Yu. "We feel we are contributing to society. It's also exciting to be the first person to understand a problem." He admits that this feeling doesn't come often, but when an idea works after a hundred failures, "I get in this mood and just feel excited...I cherish that feeling."

Spotlight on New Faculty Members

Guoqiang Yu came to Virginia Tech after a post-doctoral fellowship at Stanford University School of Medicine and Bio-X. He earned his B.S. in Electrical Engineering from Shandong University in 2001, his M.S. from TsinghuaUniversity in 2004, and his Ph.D. from Virginia Tech in 2011. Yu has published 25 journal and conference papers and contributed to a chapter of a recent book titled Statistical Diagnostics of Cancer. Yu is posted in the Capital Region, Virginia Tech Research CenterArlington.