Token-level Interpolation for Class-Based Language Models

Proc. IEEE ICASSP |

Published by IEEE - Institute of Electrical and Electronics Engineers

We describe a method for interpolation of class-based n-gram language models. Our algorithm is an extension of the traditional EM-based approach that optimizes perplexity of the training set with respect to a collection of n-gram language models linearly combined in the probability space. However, unlike prior work, it naturally supports context-dependent interpolation for class-based LMs. In addition, the method works naturally with the recently introduced wordphrase-entity (WPE) language models that unify words, phrases and entities into a single statistical framework. Applied to the Calendar scenario of the Personal Assistant domain, our method achieved significant perplexity reduction and improved word error rates.