The SRI-ICSI Spring 2007 Meeting and Lecture Recognition System
- Andreas Stolcke ,
- Kofi Boakye ,
- Özgür Çetin ,
- Adam Janin ,
- Mathew Magimai-Doss ,
- Chuck Wooters ,
- Jing Zheng
Multimodal Technologies for Perception of Humans. International Evaluation Workshops CLEAR 2007 and RT 2007 |
Published by Springer
We describe the latest version of the SRI-ICSI meeting and lecture recognition system, as was used in the NIST RT-07 evaluations, highlighting improvements made over the last year. Changes in the acoustic preprocessing include updated beamforming software for processing of multiple distant microphones, and various adjustments to the speech segmenter for close-talking microphones. Acoustic models were improved by the combined
use of neural-net-estimated phone posterior features, discriminative feature transforms trained with fMPE-MAP, and discriminative Gaussian estimation usng MPE-MAP, as
well as model adaptation specifically to nonnative and non-American speakers.
The net effect of these enhancements was a 14-16% relative error reduction on distant microphones, and a 16-17% error reduction on close-talking microphones. Also, for the first time, we report results on a new “coffee break” meeting genre, and on a new NIST metric designed to evaluate combined speech diarization and recognition.