Unsupervised speaker segmentation of telephone conversations.

Aaron E. Rosenberg; Allen Gorin; Zhu Liu; Sarangarajan Parthasarathy

Unsupervised speaker segmentation of telephone conversations.

Aaron E. Rosenberg ,
Allen Gorin ,
Zhu Liu ,
Sarangarajan Parthasarathy

ICSLP 2002 | September 2002

Organized by ISCA

Download BibTex

A process for segmenting 2-speaker telephone conversations by speaker with no prior speaker models is described and evaluated. The process consists of an initial segmentation using acoustic change and pause detection, segment clustering, and iterative modeling of segment clusters and resegmentation. The technique has been evaluated on (6), approximately 3 min long, customer care conversations. The technique does not resolve short (< 2 secs) or overlapping segments very well, but is capable of detecting longer segments (> 4 secs) with miss rates of the order of 10% and confusion rates 2% or less.