Regularized Sequence-Level Deep Neural Network Model Adaptation
Interspeech 2015 |
We propose a regularized sequence-level (SEQ) deep neural network(DNN)modeladaptationmethodologyasanextension of the previous KL-divergence regularized cross-entropy (CE) adaptation. In this approach, the negative KL-divergence between the baseline and the adapted model is added to the maximum mutual information (MMI) as regularization in the sequence-level adaptation.
We compared eight different adaptation setups specified by the baseline training criterion, the adaptation criterion, and the regularization methodology. We found that the proposed sequence-level adaptation consistently outperforms the crossentropy adaptation. For both of them, regularization is critical. We further introduced a unified formulation in which the regularized CE and SEQ adaptation are the special cases.
We applied the proposed approach to speaker adaptation and accent adaptation in a mobile short message dictation task. For the speaker adaptation, with 25 or 100 utterances, the proposed approach yields 13.72% or 23.18% WER reduction when adapting from the CE baseline, comparing to 11.87% or 20.18% for the CE adaptation. For the accent adaptation, with 1K utterances, the proposed approach yields 18.74% or 19.50% WER reduction when adapting from the CE-DNN or the SEQ-DNN. The WER reduction using the regularized CE adaptation is 15.98% and 15.69%, respectively.