Adaptation of Compressed HMM Parameters for Resource-Constrained Speech Recognition

Jinyu Li; Li Deng; Dong Yu; Jian Wu; Yifan Gong; Alex Acero

Adaptation of Compressed HMM Parameters for Resource-Constrained Speech Recognition

Jinyu Li ,
Li Deng ,
Dong Yu ,
Jian Wu ,
Yifan Gong ,
Alex Acero

April 2008

Published by Institute of Electrical and Electronics Engineers, Inc.

Download BibTex

Recently, we successfully developed and reported a new unsupervised online adaptation technique, which jointly compensates for additive and convolutive distortions with vector Taylor series (JAC/VTS), to adjust (uncompressed) HMMs under acoustically distorted environments [1]. In this paper, we extend that technique to adapt compressed HMMs using JAC/VTS where limited computation and/or memory resources are available for speech recognition (e.g., on mobile devices). Subspace coding (SSC) is developed and used to quantize each dimension of the multivariate Gaussians in the compressed HMMs. Three algorithmic design options are proposed and evaluated that combine SSC with JAC/VTS, where three different types of tradeoffs are made between recognition accuracy and the required computation/memory/storage resources. The strengths and weaknesses of these three options are discussed and shown on the Aurora2 task of noise-robust speech recognition. The first option greatly reduces the storage space and gives 93.2% accuracy, which is the same as the baseline accuracy but with little reduction in the run-time computation/memory cost. The second option reduces about 79.9% of the computation cost and about 33.5% of the memory requirement at a very small price of 0.5% decrease of accuracy (to 92.7%). The third option cuts about 89.2% of the computation cost and about 65.5% of the memory requirement while reducing recognition accuracy by 2.7% (to 90.5%).

© 2007 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.