Multi-scale Genomic Inference using Biologically Annotated Neural Networks
- Pinar Demetci ,
- Wei Cheng ,
- Gregory Darnell ,
- Xiang Zhou ,
- Sohini Ramachandran ,
- Lorin Crawford
bioRxiv |
With the emergence of large-scale genomic datasets, there is a unique opportunity to integrate machine learning approaches as standard tools within genome-wide association (GWA) studies. Unfortunately, while machine learning methods have been shown to account for nonlinear data structures and exhibit greater predictive power over classic linear models, these same algorithms have also become criticized as “black box” techniques. Here, we present biologically annotated neural networks (BANNs), a novel probabilistic framework that makes machine learning fully amenable for GWA applications. BANNs are feedforward models with partially connected architectures that are based on biological annotations. This setup yields a fully interpretable neural network where the input layer encodes SNP-level effects, and the hidden layer models the aggregated effects among SNP-sets. Part of our key innovation is to treat the weights and connections of the network as random variables with prior distributions that reflect how genetic effects are manifested at different genomic scales. The BANN software uses scalable variational inference to provide fully interpretable posterior summaries which allow researchers to simultaneously perform (i) fine-mapping with SNPs and (ii) enrichment analyses with SNP-sets on complex traits. Through simulations and real GWA data applications, we show that our method improves upon state-of-the-art approaches in both settings across a wide range of genetic architectures.