Whispering Wearables: Multimodal Approach to Silent Speech Recognition with Head-Worn Devices
- Tanmay Srivastava ,
- R. Michael Winters ,
- Yu-Te Wang ,
- Thomas M. Gable ,
- Teresa LaScala ,
- Ivan Tashev
International Conference on Multimodal Interaction |
Published by ACM | Organized by ACM
Silent speech recognition has emerged as a promising approach for Thomas M. Gable Microsoft Corporation United States [email protected] Ivan J. Tashev Microsoft Research Labs, Microsoft Corporation United States [email protected] Silent speech recognition; Accessibility; EXG, and IMU sensing enabling hands-free and discreet interaction with head-worn de vices. In this paper, we present QuietSync, a multimodal system that combines inertial measurement unit (IMU) and contact electrode (ExG) signals to achieve accurate silent speech recognition using of-the-shelf devices. QuietSync utilizes an IMU attached to the lower part of the headphones near the ear and strategically places ExG electrodes on the headphones, glasses (nose and behind the ear), and face (for VR applications) to capture subtle movements and muscle activity associated with silent speech production. We con ducted a user study with 9 participants and successfully recognized 12 commands with an accuracy of 94.2%. Our system leverages the complementary nature of IMU and ExG signals to enhance the robustness and reliability of silent speech recognition. The IMU captures subtle movements of the jaw and facial muscles, while the ExG electrodes detect low-amplitude surface muscle activity associated with speech production. We show that our system is not affected by the length and speech mannerisms of the commands, and can be fine-tuned for users of varied native languages with only 5 samples. Our findings demonstrate the feasibility of using of-the-shelf head-worn devices to enable silent speech recognition, opening up new possibilities for seamless and discreet interaction with devices such as VR/AR headsets and earables. To the best of our knowledge, QuietSync is the first system to enable silent speech interaction for multiple form factors.