We want to use eye gaze and face pose to understand what users are looking at, to what they are attending, and use this information to improve speech recognition. Any sort of language constraint makes speech recognition and understanding easier since the we know what words might come next.
Our work has shown significant performance improvements in all stages of the speech-processing pipeline: including addressee detection, speech recognition, and spoken-language understanding.