Evaluation of a language model using a clustered model backoff

  • John Miller ,
  • Fil Alleva

Fourth International Conference on Spoken Language, 1996. ICSLP 96. Proceedings. |

Publication

Describes and evaluates a language model using word classes that have been automatically generated from a word clustering algorithm. Class-based language models have been shown to be effective for rapid adaptation, training on small datasets, and reduced memory usage. In terms of model perplexity, prior work has shown diminished returns for class-based language models constructed using very large training sets. This paper describes a method of using a class model as a backoff to a bigram model which produced significant benefits even when trained from a large text corpus. Tests results on the Whisper continuous speech recognition system show that, for a given word error rate, the clustered bigram model uses 2/3 fewer parameters compared to a standard bigram model using unigram backoff.