Exploiting frequency-scaling invariance properties of the scale transform for automatic speech recognition.

ICSLP 2000 |

An experimental study of the application of scale-transform to improve the performance of speaker independent continuous speech recognition, is presented in this paper. Three major results are described. First, a comparison was made between the scale-transform based magnitude cepstrum coefficients (STCC) and mel-scale filter bank cepstrum coefficients (MFCC) on a telephone based connected digit recognition task. It was shown that the STCC can obtain a performance that is close to that of the MFCC. Second, a simple frequency-normalization procedure was applied to the scale-transform representation that improved performance on the connected digit recognition task with respect to the MFCC. Finally, in a more controlled experimental setting using the TIMIT database, it was shown that the application of phone-specific frequency warpings improved phone classification performance over using a single speaker-specific warping. This last result may have general implications for all frequency warping based speaker normalization procedures.