A Discriminative Training Framework Using N-Best Speech Recognition Transcriptions and Scores for Spoken Utterance Classification

Sibel Yaman; Li Deng; Dong Yu; Ye-Yi Wang; Alex Acero

A Discriminative Training Framework Using N-Best Speech Recognition Transcriptions and Scores for Spoken Utterance Classification

Sibel Yaman ,
Li Deng ,
Dong Yu ,
Ye-Yi Wang ,
Alex Acero

Proc. of the International Conference on Acoustics, Speech and Signal Processing | April 2007

Published by Institute of Electrical and Electronics Engineers, Inc.

In this paper, we propose a novel discriminative training approach to spoken utterance classification (SUC). The ultimate objective of the SUC task, originally developed to map a spoken speech utterance into the most appropriate semantic class, is to minimize the classification error rate (CER). Conventionally, a two-phase approach is adapted, in which the first phase is the ASR transcription phase, and the second phase is the semantic classification phase. In the proposed framework, the classification error rate is approximated as differentiable functions of the language and classifier model parameters. Furthermore, in order to exploit all the available information from the first phase, class-specific discriminant functions are defined based on score functions derived from the N-best lists. Our experimental results on the standard ATIS database indicate a notable reduction in CER from the earlier best result on the identical task. The proposed framework achieved a reduction of CER from 4.92% to 4.04%.

© 2007 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.