Voice Search of Structured Media Data

Young-In Song; Ye-Yi Wang; Y. C. Ju; Mike Seltzer; Ivan Tashev; Alex Acero

Voice Search of Structured Media Data

Young-In Song ,
Ye-Yi Wang ,
Y. C. Ju ,
Mike Seltzer ,
Ivan Tashev ,
Alex Acero

International Conference on Acoustics, Speech and Signal Processing | April 2009

Published by Institute of Electrical and Electornic Engineers, Inc.

Download BibTex

This paper addresses the problem of using unstructured queries to search a structured database in voice search applications. By incorporating structural information in music metadata, the end-to-end search error has been reduced by 15% on text queries and up to 11% on spoken queries. Based on that, an HMM sequential rescoring model has reduced the error rate by 28% on text queries and up to 23% on spoken queries compared to the baseline system. Furthermore, a phonetic similarity model has been introduced to compensate speech recognition errors, which has improved the end-to-end search accuracy consistently across different levels of speech recognition accuracy.