Improved fragment sampling for ab initio protein structure prediction using deep neural networks
- Tong Wang ,
- Yanhua Qiao ,
- Wenze Ding ,
- Wenzhi Mao ,
- Yaoqi Zhou ,
- Haipeng Gong
Nature Machine Intelligence | , Vol 1(8): pp. 347-355
A typical approach to predicting unknown native structures of proteins is to assemble the amino acid residues (fragments) extracted from known structures. The quality of these extracted fragments, which are used to build protein-specific fragment libraries, can determine the success or failure of sampling near-native conformations. Here we show how a high-quality fragment library can be built using deep contextual learning techniques. Our algorithm, called DeepFragLib, employs bidirectional long short-term-memory recurrent neural networks with knowledge distillation for initial fragment classification, followed by an aggregated residual transformation network with cyclically dilated convolution for detecting near-native fragments. DeepFragLib improves the position-averaged proportion of near-native fragments by 12.2% over existing methods and, consequently, produces better near-native structures for 72.0% of the free-modelling domain targets tested when integrated with Rosetta. DeepFragLib is fully parallelized and available for use in conjunction with structure prediction programs. An approach to protein structure prediction is to assemble candidate structures from template fragments, which are extracted from known protein structures. Wang et al. demonstrate that combining deep neural network architectures with a relatively small but high-resolution fragment dataset can improve the quality of the sample fragment libraries used for protein structure prediction.