Large-Margin Minimum Classification Error Training for Large-Scale Speech Recognition Tasks

Dong Yu; Li Deng; Xiaodong He; Alex Acero

Large-Margin Minimum Classification Error Training for Large-Scale Speech Recognition Tasks

Dong Yu ,
Li Deng ,
Xiaodong He ,
Alex Acero

Proceedings of the ICASSP, Honolulu, Hawaii | April 2007

Published by IEEE

Recently, we have developed a novel discriminative training method named large-margin minimum classification error (LM-MCE) training that incorporates the idea of discriminative margin into the conventional minimum classification error (MCE) training method. In our previous work, this novel approach was formulated specifically for the MCE training using the sigmoid loss function and its effectiveness was demonstrated on the TIDIGITS task alone. In this paper two additional contributions are made. First, we formulate LM-MCE as a Bayes risk minimization problem whose loss function not only includes empirical error rates but also a margin-bound risk. This new formulation allows us to extend the same technique to a wide variety of MCE based training. Second, we have successfully applied LM-MCE training approach to the Microsoft internal large vocabulary telephony speech recognition task (with 2000 hours of training data and 120K of vocabulary) and achieved significant recognition accuracy improvement across-the-board. To our best knowledge, this is the first time that the large-margin approach is demonstrated to be successful in large-scale speech recognition tasks

© 2008 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.http://www.ieee.org/