A Flat Direct Model for Speech Recognition

Georg Heigold; Geoffrey Zweig; Xiao Li; Patrick Nguyen

A Flat Direct Model for Speech Recognition

Georg Heigold ,
Geoffrey Zweig ,
Xiao Li ,
Patrick Nguyen

ICASSP-2009 | January 2009

Published by IEEE

We introduce a direct model for speech recognition that assumes an unstructured, i.e., ﬂat text output. The ﬂat model allows us to model arbitrary attributes and dependencies of the output. This is different from the HMMs typically used for speech recognition. This conventional modeling approach is based on sequential data and makes rigid assumptions on the dependencies. HMMs have proven to be convenient and appropriate for large vocabulary continuous speech recognition. Our task under consideration, however, is the Windows Live Search for Mobile (WLS4M) task [1]. This is a cellphone application that allows users to interact with web-based information portals. In particular, the set of valid outputs can be considered discrete and ﬁnite (although probably large, i.e., unseen events are an issue). Hence, a ﬂat direct model lends itself to this task, making the adding of different knowledge sources and dependencies straightforward and cheap. Using e.g. HMM posterior, m-gram, and spotter features, signiﬁcant improvements over the conventional HMM system were observed.

© 2008 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.http://www.ieee.org/