A Study on Multilingual Acoustic Modeling For Large Vocabulary ASR

Hui Lin; Li Deng; Chi-Hui Lee; Dong Yu; Alex Acero; Yifan Gong

A Study on Multilingual Acoustic Modeling For Large Vocabulary ASR

Hui Lin ,
Li Deng ,
Chi-Hui Lee ,
Dong Yu ,
Alex Acero ,
Yifan Gong

Proceedings of the ICASSP | April 2009

Published by Institute of Electrical and Electronics Engineers, Inc.

We study key issues related to multilingual acoustic modeling for automatic speech recognition (ASR) through a series of large-scale ASR experiments. Our study explores shared structures embedded in a large collection of speech data spanning over a number of spoken languages in order to establish a common set of universal phone models that can be used for large vocabulary ASR of all the languages seen or unseen during training. Language-universal and language-adaptive models are compared with language-specific models, and the comparison results show that in many cases it is possible to build general-purpose language-universal and language-adaptive acoustic models that outperform language-specific ones if the set of shared units, the structure of shared states, and the shared acoustic-phonetic properties among different languages can be properly utilized. Specifically, our results demonstrate that when the context coverage is poor in language-specific training, we can use one tenth of the adaptation data to achieve equivalent performance in cross-lingual speech recognition.

© 2007 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.