From Flat Direct Models to Segmental CRF Models
- Geoffrey Zweig ,
- Patrick Nguyen
ICASSP |
Published by IEEE
This paper summarizes recent work at Microsoft on the development of novel direct models. The key characteristic of our approaches is the use of long-span segment level features that relate acoustic properties directly to words. In this approach, the frame-level Markov assumption is replaced by the segment level Markov property, allowing us to extract long-span features. A key issue we address is the definition of generalizable features which allow us to model unseen words. We review two recently developedmodels that have this property: Flat Direct Models (FDMs), and Segmental CRFs (SCRFs). The first operates in a log-linear framework, and uses utterance level features. The second is also a log-linear model, but defines features at the word-segment level. We present new experimental results comparing the two approaches. We find that both show consistent improvements over a baseline system, and that the extra context available to the FDM enables slightly better performance in a rescoring context. This gain comes at the expense of applicability to first pass decoding, for which the SCRF is better suited.
© 2008 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.http://www.ieee.org/