Conversational Speech Transcription Using Context-Dependent Deep Neural Networks

Dong Yu; Frank Seide; Gang Li

Conversational Speech Transcription Using Context-Dependent Deep Neural Networks

Dong Yu ,
Frank Seide ,
Gang Li

ICML 2012 | June 2012

Download BibTex

Context-Dependent Deep-Neural-Network HMMs, or CD-DNN-HMMs, combine the classic artificial-neural-network HMMs with traditional context-dependent acoustic modeling and deep-belief-network pre-training. CD-DNN-HMMs greatly outperform conventional CD-GMM (Gaussian mixture model) HMMs: The word error rate is reduced by up to one third on the difficult benchmarking task of speaker-independent single-pass transcription of telephone conversations.