This paper describes an algorithm for compressing the spectral representation of an utterance along the time axis while keeping the main features intact. The goal of the algorithm is to save template storage space and to reduce the time required for recognition. For 8 speakers, 5 data sets each, the results indicated that we can save about 40% of the template space and 35% of the recognition time with only a slightly higher error rate.
Effect of Reference Set Selection on Speaker Dependent Speech Recognition. Frame Compression in Isolated Word Recognition
- Zongge Li ,
- Fil Alleva ,
- Raj Reddy