Robust Speech Recognition by Normalization of the Acoustic Space
- Alex Acero ,
- Richard Stern
Proc. of the International Conference on Acoustics, Speech and Signal Processing |
Published by Institute of Electrical and Electronics Engineers, Inc.
In this paper we present several algorithms that increase the robustness of SPHINX, the CMU continuous-speech speaker-independent recognition system, by normalizing the acoustic space via minimization of the overall VQ distortion. We propose an affine transformation of the cepstrum in which a matrix multiplication performs frequency normalization and a vector addition attempts environment normalization. The algorithms for environment normalization are very efficient and they improve dramatically the recognition accuracy when the system is tested on a microphone other from the one on which it was trained. The frequency normalization algorithm applies a different warping of the frequency axis to different speakers and it achieves a 10% decrease in error rate.
© 2007 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.