Augmenting subnetwork inference with information extracted from the scientific literature
- Sid Kiblawi ,
- Deborah Chasman ,
- Amanda Henning ,
- Eunju Park ,
- Hoifung Poon ,
- Michael Gould ,
- Paul Ahlquist ,
- Mark Craven
PLOS Computational Biology |
Many biological studies involve either (i) manipulating some aspect of a cell or its environment and then simultaneously measuring the effect on thousands of genes, or (ii) systematically manipulating each gene and then measuring the effect on some response of interest. A common challenge that arises in these studies is to explain how genes identified as relevant in the given experiment are organized into a subnetwork that accounts for the response of interest. The task of inferring a subnetwork is typically dependent on the information available in publicly available, structured databases, which suffer from incompleteness. However, a wealth of potentially relevant information resides in the scientific literature, such as information about genes associated with certain concepts of interest, as well as interactions that occur among various biological entities. We contend that by exploiting this information, we can improve the explanatory power and accuracy of subnetwork inference in multiple applications. Here we propose and investigate several ways in which information extracted from the scientific literature can be used to augment subnetwork inference. We show that we can use literature-extracted information to (i) augment the set of entities identified as being relevant in a subnetwork inference task, (ii) augment the set of interactions used in the process, and (iii) support targeted browsing of a large inferred subnetwork by identifying entities and interactions that are closely related to concepts of interest. We use this approach to uncover the pathways involved in interactions between a virus and a host cell, and the pathways that are regulated by a transcription factor associated with breast cancer. Our experimental results demonstrate that these approaches can provide more accurate and more interpretable subnetworks. Integer program code, background network data, and pathfinding code are available at https://github.com/Craven-Biostat-Lab/subnetwork_inference (opens in new tab).