Audio and Acoustics Research at Microsoft
Getting the sound right is a crucial ingredient in natural user interfaces, immersive gaming, realistic virtual and mixed reality, and ubiquitous computing. Audio also plays an important role in assistive technologies for people who are blind or have low vision, and speech recognition and processing can help support those who are deaf or hard of hearing. Although computers have been capable of playing and processing high-fidelity audio for many decades, there are many frontiers left to explore in computational recognition, analysis and rendering of sound for speech or immersive sound fields.
Audio has been a key research area since Microsoft Research was founded in 1991 – in its first year, researchers used audio data as well as other cues to explore automatic summarization of audiovisual presentations. Over the years, there have been steady and significant research advances in speech recognition, natural user interfaces, audio as a tool for collaboration and productivity, capturing and reproducing sound, spatial audio, acoustic simulation and audio analytics.
Many of these advances have shipped in Microsoft products and services like Windows 10, Kinect, HoloLens and Teams, as well as Ford’s SYNC in-car infotainment system, Polycom’s videoconferencing devices, and major game titles such as Gears of War, Sea of Thieves and Borderlands 3. Still more are working their way into future products and services, and into the hands of developers.
Use the timelines below to explore several threads of audio and acoustics research as they evolved from theories and experiments to real-world applications.
Speech recognition and natural user interfaces
-
2002
Microsoft researchers establish the Sound Capture and Speech Enhancement project
The Sound Capture and Speech Enhancement project begins to explore areas such as acoustic echo reduction, microphone array processing and noise reduction.
-
2007
Ford releases SYNC
Ford releases the first version of its SYNC in-car infotainment system, with a speech enhancement audio pipeline first designed by Microsoft researchers.
-
2007
Windows support for microphone arrays
Microsoft releases Windows Vista, including support for four preselected microphone array geometries and standardized support for USB microphone arrays. Later, Windows 10 is updated to include support for microphone arrays with arbitrary geometry.
-
2010
Hands-free control in Kinect
Microsoft releases Kinect for Xbox 360, which includes the first hands-free open microphone command and control product with surround sound echo cancellation.
-
2016
Microsoft releases HoloLens
Microsoft releases HoloLens, which contains a four-element microphone array and a sophisticated sound capture and speech enhancement system for capturing the voice of the wearer and the ambient sound environment.
-
2017
Researchers begin exploring neural networks for speech enhancement
In 2017, Microsoft researchers establish the Neural Networks-Based Speech Enhancement project, which aims for more accurate and reliable speech processing, particularly on mobile, wearable, smart home and IoT devices – which, unlike previous devices, present new challenges such as noisier background environments, greater speaker-microphone distances, and limited edge processing abilities.
-
2019
Microsoft releases HoloLens 2
The device contains a five-element microphone array and sophisticated sound capture and speech enhancement system for capturing the voice of the wearer as well as the ambient sound environment. Researchers explored key components of its speech enhancement technology earlier in the year.
-
2020
Speech enhancement incorporated into Microsoft Teams
Microsoft CEO Satya Nadella announces (opens in new tab) that new improvements to Microsoft Teams will include a neural network-based speech enhancement algorithm.
Microsoft research podcast
Audio for collaboration and productivity
-
1991
First audio-related paper published
Microsoft researchers publish their first audio-related paper, on the automatic summarization of multimedia presentations.
-
1996
Seeing the sound
In 1996, Microsoft researchers explore ways to use vision data to capture and render sound in interactive environments.
-
1999
Progress in audio detection and classification
- Publication Detection of target speakers in audio databases
-
2001
Project RingCam established
Microsoft researchers establish Project RingCam, to explore 360-degree videoconferencing.
-
2007
Microsoft RoundTable ships with speaker detection technology
Speaker detection technology developed by Microsoft researchers ships as part of the Microsoft Roundtable conferencing system.
The technology is later sold to Polycom and released as the Polycom CX5000.
Capturing and reproducing sound
-
1998
Researchers begin experimenting with microphone arrays
Microsoft researchers build their first microphone array, using an Erector set.
-
2005
USB microphone array prototypes
Microsoft researchers establish the Audio Devices project, and build and evaluate two USB microphone array prototypes: a four element linear array and an eight element circular array.
-
2007
An anechoic chamber in Building 99
Microsoft Research Redmond moves into its new home in Building 99. The building includes the company’s first anechoic chamber.
Key publications from 2007:
-
2009
Anechoic chamber retrofitted to measure sound in 3D
The anechoic chamber in Building 99 is retrofitted to automatically measure 3D directivity and radiation patterns, including human spatial hearing. It uses a 3D scanner with sub-millimeter accuracy to measure the head and torso. Among other things, this enables the advancement of head-related transfer functions (HRTFs), which can enable more realistic-sounding spatial audio.
-
2012
Progress in microphone arrays
Microsoft researchers build a spherical 16 channel microphone array and a cylindrical 16 channel microphone array to study sound field decomposition using spherical and cylindrical functions. In 2016, they build a 64-channel spherical microphone array.
-
2017
A new approach to gesture recognition
Ultrasound-based Gesture Recognition – This paper introduces a new approach to gesture recognition using ultrasound waves, which uses significantly less power than optical systems.
- Publication Multimodal Gesture Recognition
-
2018
Live 360 audio and video streaming
-
2019
Project Denmark established
Microsoft researchers establish Project Denmark, which aims to achieve high-quality capture of meeting conversations using virtual microphone arrays composed of ordinary consumer devices such as mobile phones and laptops.
Spatial audio
-
2012
New directions for spatial audio
Microsoft researchers begin exploring new approaches to head-related transfer functions (HRTFs), which represent the acoustic transfer function from a sound source at a given location to the ear drums of a human. A potential consequence of this work is more realistic spatial audio that is tuned to the shape of the listener’s head and torso.
-
2015
Virtual surround sound in Windows 10
Microsoft releases Windows 10 with support for virtual surround sound, marketed as Windows Sonic. This spatial audio rendering system is later released as part of HoloLens.
-
2016
Personalized audio rendering in HoloLens
Microsoft releases HoloLens. The device features an audio rendering system with on-the-fly personalization of the wearer’s spatial hearing.
-
2016
Microsoft releases the Windows Mixed Reality platform
Windows 10 includes support for virtual and mixed reality headsets manufactured by other companies. The platform contains an extended and improved version of the spatial audio engine.
-
2017
A map delivered in 3D sound
Microsoft releases Soundscape (in collaboration with Guide Dogs UK) – a helper app for visually impaired people, which includes a spatial audio rendering system. Read about the research behind the product.
-
2018
Podcast: Hearing in 3D with Dr. Ivan Tashev
Key publications from 2018:
Acoustic simulation
-
2010
Microsoft researchers establish Project Triton
Prior to 2010, a key challenge in interactive audio had been the fast modeling of wave effects in complex game scenes: smooth sound obstruction around doorways, or dynamic reverberation, responsive to both source and listener motion. In the paper below, Microsoft researchers introduced the idea of pre-computing physically accurate wave simulations, and showed that it was a viable path forward for interactive audio and games.
Project Triton explores a physics-based approach to modeling virtual environments, for more realistic in-game audio.
-
2012
Researchers begin collaboration with game studios
Microsoft researchers begin collaborating with The Coalition Studio to incorporate this acoustic simulation work into Gears of War, transitioning from exploratory research to a targeted redesign focused on performance and flexibility.
- 2013: The first working prototype of Project Triton is demonstrated internally.
- 2014: This paper describes the core design of Project Triton, combining perceptual coding, spatial compression and parametric rendering. The design solves the problem of system resource usage, and integrates easily into existing audio tools. Later work has built on this core design, with various improvements.
- 2015: A Microsoft Research summer intern researches a novel adaptive sampling approach to resolve a key robustness issue in Project Triton.
-
2016
Project Triton ships in Gears of War 4
Project Triton ships as part of Gears of War 4 – the first instance of game acoustics provided by accurate physics-based simulation.
-
2017
Project Triton in Virtual and Mixed Reality
After years of development and refinement for use in games, Project Triton is used in the Mixed Reality experience shipped as part of the Windows 10 Fall Creator’s Update. It provides a natural acoustic experience in the virtual “cliffhouse” space, with new directional acoustics features such as sound that is obstructed by virtual objects, or heard as if coming around corners or through doorways. This experience also incorporates advances in HRTFs described in the previous timeline.
In 2018, Project Triton ships as part of Sea of Thieves, the second game to incorporate this technology. The game included custom modifications for evaluating acoustics modularly, illustrating the flexibility of the system.
-
2019
Podcast: Project Triton and the Physics of Sound with Dr. Nikunj Raghuvanshi
-
2019
Project Triton technology released as Project Acoustics
Microsoft makes Project Triton technology available to developers as Project Acoustics (opens in new tab), including Unity and Unreal plugins for easy integration into games and research prototypes.
- 2019: Gears of War 5 ships, with an immersive audio experience that combines headphone rendering technologies such as Windows Sonic and Dolby Atmos with Triton’s scene-informed sound propagation.
- 2019: Borderlands 3 ships. This is the first game studio outside Microsoft to employ Project Triton.
-
2020
Project Acoustics incorporated into HoloLens
This milestone marks the first demonstration of physical acoustics in augmented reality.
-
2020
In this webinar, Microsoft Principal Researcher Dr. Nikunj Raghuvanshi covers the ins and outs of creating practical, high-quality sound simulations. It includes an overview of the three components of sound simulation: synthesis, propagation, and spatialization, as well as a focus on Project Triton. For each, he will review the underlying physics, research techniques, practical considerations, and open research questions.
Audio analytics
-
2010
Audio Analytics project established
Microsoft researchers establish the Audio Analytics project, to explore research directions such as extracting non-verbal cues from human speech, detecting specific audio events and background noise, and audio search and retrieval. Potential applications include customer satisfaction analysis from customer support calls, media content analysis and retrieval, medical diagnostic aids and patient monitoring, assistive technologies for people with hearing impairments, and audio analysis for public safety.
-
2015
«Hey, Cortana» uses speaker identification
Microsoft releases Windows 10 with speaker identification as part of the “Hey, Cortana” wake-up feature.