Foldering voicemail messages by caller using text independent speaker recognition.

Aaron E. Rosenberg; Sarangarajan Parthasarathy; Julia Hirschberg; Stephen Whittaker

Foldering voicemail messages by caller using text independent speaker recognition.

Aaron E. Rosenberg ,
Sarangarajan Parthasarathy ,
Julia Hirschberg ,
Stephen Whittaker

ICSLP 2000 | October 2000

Download BibTex

The ability to automatically scan voicemail messages for content and caller identity cues would be a useful service. This paper describes a system which automatically files voicemail messages into caller folders using text independent speaker recognition techniques. Callers are represented by Gaussian mixture models (GMM’s). The speech for an incoming message is processed and scored against caller models created for a subscriber. A message whose matching score exceeds a threshold is filed in the matching caller folder; otherwise it is tagged as “unknown”. The subscriber has the ability to listen to an “unknown” message and file it in the proper folder, if it exists, or create a new folder, if it does not. Such subscriber labelled messages are used to train and adapt caller models. The system has been evaluated on a database of voicemail messages collected at AT&T Labs. A set of 20 callers from this database is designated as “ingroup”. Each of these callers has recorded at least 20 messages totalling 10 or more minutes in duration. A distinct set of 220 messages, each from a dierent caller, are designated as “outgroup”. Representative performance figures with threshold parameters set to ensure that outgroup acceptance is low compared with ingroup rejection are the following. The average ingroup message rejection rate is 11.0% and the average ingroup message confusion rate (matching the wrong caller) is 1.0%, while the average outgroup message accept rate is 2.7%.