Deep Neural Network for Automatic Speech Recognition: from the Industry’s View
Recently, a new acoustic model, referred to as the context-dependent deep neural network hidden Markov model (CD-DNN-HMM), has been developed. It has been shown, by many groups, to outperform the conventional GMM-HMMs in many automatic speech recognition (ASR) tasks. It has been widely deployed to real-world products from Microsoft, Google etc. and benefits millions of users. Given its current success, I will talk about the most important research topics of DNN in industry and how we handle these challenges by answering the following questions: 1) How to reduce the runtime cost without accuracy loss so that the user feel no latency increase when switching from GMM to DNN? 2) How to do speaker personalization with small footprint? 3) How to enable new languages with limited training data? 4) How to reduce accuracy gap between large and small DNN? 5) How to handle unseen scenario? 6) How to enable DNN to handle different factors?