Research Collection: The Unseen History of Audio and Acoustics Research at Microsoft

Publié août 12, 2020

Partagez cette page

Audio and Acoustics Research at Microsoft

Getting the sound right is a crucial ingredient in natural user interfaces, immersive gaming, realistic virtual and mixed reality, and ubiquitous computing. Audio also plays an important role in assistive technologies for people who are blind or have low vision, and speech recognition and processing can help support those who are deaf or hard of hearing. Although computers have been capable of playing and processing high-fidelity audio for many decades, there are many frontiers left to explore in computational recognition, analysis and rendering of sound for speech or immersive sound fields.

audio and acoustics: woman and man setting up a dummy in anachoic chamber

Audio has been a key research area since Microsoft Research was founded in 1991 – in its first year, researchers used audio data as well as other cues to explore automatic summarization of audiovisual presentations. Over the years, there have been steady and significant research advances in speech recognition, natural user interfaces, audio as a tool for collaboration and productivity, capturing and reproducing sound, spatial audio, acoustic simulation and audio analytics.

Many of these advances have shipped in Microsoft products and services like Windows 10, Kinect, HoloLens and Teams, as well as Ford’s SYNC in-car infotainment system, Polycom’s videoconferencing devices, and major game titles such as Gears of War, Sea of Thieves and Borderlands 3. Still more are working their way into future products and services, and into the hands of developers.

Use the timelines below to explore several threads of audio and acoustics research as they evolved from theories and experiments to real-world applications.

Speech recognition and natural user interfaces

2002
Microsoft researchers establish the Sound Capture and Speech Enhancement project

The Sound Capture and Speech Enhancement project begins to explore areas such as acoustic echo reduction, microphone array processing and noise reduction.
Publication Gain Self-Calibration Procedure for Microphone Arrays
This paper introduces one of the technologies that made microphone arrays feasible for manufacturing.
Publication A New Beamformer Design Algorithm for Microphone Arrays
Publication Reverberation Reduction for Better Speech Recognition
Publication Microphone Array Post-Processor Using Instantaneous Direction of Arrival
2007
Ford releases SYNC

Ford releases the first version of its SYNC in-car infotainment system, with a speech enhancement audio pipeline first designed by Microsoft researchers.
Video Natural Language Moves In-Car Infotainment Forward
February 2009
Publication Unified Framework for Single Channel Speech Enhancement
This paper introduced the parameter optimization approach used in Ford SYNC’s speech enhancement pipeline (August 2009).
2007
Windows support for microphone arrays

Microsoft releases Windows Vista, including support for four preselected microphone array geometries and standardized support for USB microphone arrays. Later, Windows 10 is updated to include support for microphone arrays with arbitrary geometry.
Publication Sound Capture and Processing: Practical Approaches
This book includes the introduction of multichannel acoustic echo cancellation, which later ships as part of Microsoft Kinect (July 2009).
2010
Hands-free control in Kinect

Microsoft releases Kinect for Xbox 360, which includes the first hands-free open microphone command and control product with surround sound echo cancellation.
Publication Beamformer Design Using Measured Microphone Directivity Patterns: Robustness to Modelling Error
Publication Optimal 3D Beamforming Using Measured Microphone Directivity Patterns
Video Sound Capture Applications in Entertainment and Gaming
Publication Data Driven Suppression Rule for Speech Enhancement
Publication Kinect Development Kit: A Toolkit for Gesture- and Speech-Based Human-Machine Interaction
2016

Microsoft releases HoloLens

Microsoft releases HoloLens, which contains a four-element microphone array and a sophisticated sound capture and speech enhancement system for capturing the voice of the wearer and the ambient sound environment.
2017
Researchers begin exploring neural networks for speech enhancement

In 2017, Microsoft researchers establish the Neural Networks-Based Speech Enhancement project, which aims for more accurate and reliable speech processing, particularly on mobile, wearable, smart home and IoT devices – which, unlike previous devices, present new challenges such as noisier background environments, greater speaker-microphone distances, and limited edge processing abilities.
Publication A Causal Speech Enhancement Approach Combining Data-driven Learning and Suppression Rule Estimation
Publication A Hybrid Approach to Combining Conventional and Deep Learning Techniques for Single-channel Speech Enhancement and Recognition
Publication Convolutional-Recurrent Neural Networks for Speech Enhancement
Publication Constrained Convolutional-recurrent Networks to Improve Speech Quality with Low Impact on Recognition Accuracy
Publication Limiting Numerical Precision of Neural Networks to Achieve Real-time Voice Activity Detection
2019
Microsoft releases HoloLens 2

The device contains a five-element microphone array and sophisticated sound capture and speech enhancement system for capturing the voice of the wearer as well as the ambient sound environment. Researchers explored key components of its speech enhancement technology earlier in the year.
Publication Weighted Speech Distortion Losses for Neural-Network-Based Real-Time Speech Enhancement
Publication Acoustic Localization using Spatial Probability in Noisy and Reverberant Environments
2020

Speech enhancement incorporated into Microsoft Teams

Microsoft CEO Satya Nadella announces (opens in new tab) that new improvements to Microsoft Teams will include a neural network-based speech enhancement algorithm.

Audio for collaboration and productivity

1991

First audio-related paper published

Microsoft researchers publish their first audio-related paper, on the automatic summarization of multimedia presentations.

The slides, shown on right, are synchronized with summary-segment transitions derived from the presentation at left.
1996
Seeing the sound

In 1996, Microsoft researchers explore ways to use vision data to capture and render sound in interactive environments.
- Publication Vision-Steered Audio for Interactive Environments
1999
Progress in audio detection and classification
Publication Detection of target speakers in audio databases
This paper introduces technology to detect individual speakers in audio, which is later implemented in Microsoft RoundTable.
Publication A Robust Audio Classification and Segmentation Method
This paper introduces robust audio classification and segmentation – which is used to distinguish speech, music, environmental noise, and silence.
2001
Project RingCam established

Microsoft researchers establish Project RingCam, to explore 360-degree videoconferencing.
Publication Distributed Meetings: A Meeting Capture and Broadcasting System
2007

Microsoft RoundTable ships with speaker detection technology

Speaker detection technology developed by Microsoft researchers ships as part of the Microsoft Roundtable conferencing system.

The technology is later sold to Polycom and released as the Polycom CX5000.

Capturing and reproducing sound

1998

Researchers begin experimenting with microphone arrays

Microsoft researchers build their first microphone array, using an Erector set.

This is one of the first prototypes of microphone arrays designed in the Signal Processing group by Rico Malvar and Dinei Fiorencio in 1998.
2005

USB microphone array prototypes

Microsoft researchers establish the Audio Devices project, and build and evaluate two USB microphone array prototypes: a four element linear array and an eight element circular array.
2007
An anechoic chamber in Building 99

Microsoft Research Redmond moves into its new home in Building 99. The building includes the company’s first anechoic chamber.

Key publications from 2007:
Publication Robust Design of Wideband Loudspeaker Arrays
Publication Sound Capture System and Spatial Filter for Small Devices
2009

Anechoic chamber retrofitted to measure sound in 3D

The anechoic chamber in Building 99 is retrofitted to automatically measure 3D directivity and radiation patterns, including human spatial hearing. It uses a 3D scanner with sub-millimeter accuracy to measure the head and torso. Among other things, this enables the advancement of head-related transfer functions (HRTFs), which can enable more realistic-sounding spatial audio.

The Microsoft Research anechoic chamber set for measuring human spatial hearing.

Microsoft Campus Tours – Microsoft Research Part 1 – The Anechoic Chamber
2012

Progress in microphone arrays

Microsoft researchers build a spherical 16 channel microphone array and a cylindrical 16 channel microphone array to study sound field decomposition using spherical and cylindrical functions. In 2016, they build a 64-channel spherical microphone array.
2017
A new approach to gesture recognition

Ultrasound-based Gesture Recognition – This paper introduces a new approach to gesture recognition using ultrasound waves, which uses significantly less power than optical systems.

Figure 1: Left: Hardware set-up and close-up of the ultrasonic piezoelectric transducer at the center and an 8-element microphone array around it in a circular configuration.

Figure 2: Right: Block diagram of the proposed approach.
Publication Hardware and Algorithms for Ultrasonic Depth Imaging
Publication Multimodal Gesture Recognition
This paper further demonstrates live ultrasound sensing for gesture recognition.
2018

Live 360 audio and video streaming
2019

Project Denmark established

Microsoft researchers establish Project Denmark, which aims to achieve high-quality capture of meeting conversations using virtual microphone arrays composed of ordinary consumer devices such as mobile phones and laptops.

Spatial audio

2012
New directions for spatial audio

Microsoft researchers begin exploring new approaches to head-related transfer functions (HRTFs), which represent the acoustic transfer function from a sound source at a given location to the ear drums of a human. A potential consequence of this work is more realistic spatial audio that is tuned to the shape of the listener’s head and torso.
Publication HRTF Magnitude Modeling Using a Non-Regularized Least-Squares Fit of Spherical Harmonics Coefficients on Incomplete Data
Publication HRTF Magnitude Synthesis via Sparse Representation of Anthropometric Features
The HRTF personalization used in HoloLens.
Publication HRTF Phase Synthesis via Sparse Representation of Anthropometric Features
Blog Microsoft 3D audio tech makes virtual sounds sound real
2015
Virtual surround sound in Windows 10

Microsoft releases Windows 10 with support for virtual surround sound, marketed as Windows Sonic. This spatial audio rendering system is later released as part of HoloLens.
Publication Estimation of Multipath Propagation Delays and Interaural Time Differences from 3-D Head Scans
Publication Applications of 3D Spherical Transforms To Personalization Of Head-Related Transfer Functions
2016

Personalized audio rendering in HoloLens

Microsoft releases HoloLens. The device features an audio rendering system with on-the-fly personalization of the wearer’s spatial hearing.
2016
Microsoft releases the Windows Mixed Reality platform

Windows 10 includes support for virtual and mixed reality headsets manufactured by other companies. The platform contains an extended and improved version of the spatial audio engine.
Publication Head-related transfer function personalization for the needs of spatial audio in mixed and virtual reality
2017
A map delivered in 3D sound

Microsoft releases Soundscape (in collaboration with Guide Dogs UK) – a helper app for visually impaired people, which includes a spatial audio rendering system. Read about the research behind the product.
Publication Blind reverberation time estimation using a convolutional neural network
Video Microsoft Soundscape: A Map Delivered in 3D Sound
2018
Podcast: Hearing in 3D with Dr. Ivan Tashev

In this podcast, Dr. Tashev provides an overview of the quest for better sound processing and speech enhancement, describes the latest innovations in 3D audio, and explains why the research behind audio processing technology is, thanks to variations in human perception, equal parts science, art and craft.

Key publications from 2018:
Publication A Sparsity Measure for Echo Density Growth in General Environments
Publication Blind Room Volume Estimation from Single-channel Noisy Speech
Publication Capture, representation, and rendering of 3D audio for virtual and augmented reality
Publication Improving Binaural Ambisonics Decoding by Spherical Harmonics Domain Tapering and Coloration Compensation
Publication Spectral manipulation improves elevation perception with non-individualized head-related transfer functions

Acoustic simulation

2010
Microsoft researchers establish Project Triton

Prior to 2010, a key challenge in interactive audio had been the fast modeling of wave effects in complex game scenes: smooth sound obstruction around doorways, or dynamic reverberation, responsive to both source and listener motion. In the paper below, Microsoft researchers introduced the idea of pre-computing physically accurate wave simulations, and showed that it was a viable path forward for interactive audio and games.

Project Triton explores a physics-based approach to modeling virtual environments, for more realistic in-game audio.
Publication Precomputed Wave Simulation for Real-Time Sound Propagation of Dynamic Sources in Complex Scenes
2012
Researchers begin collaboration with game studios

Microsoft researchers begin collaborating with The Coalition Studio to incorporate this acoustic simulation work into Gears of War, transitioning from exploratory research to a targeted redesign focused on performance and flexibility.
- 2013: The first working prototype of Project Triton is demonstrated internally.
- 2014: This paper describes the core design of Project Triton, combining perceptual coding, spatial compression and parametric rendering. The design solves the problem of system resource usage, and integrates easily into existing audio tools. Later work has built on this core design, with various improvements.
- 2015: A Microsoft Research summer intern researches a novel adaptive sampling approach to resolve a key robustness issue in Project Triton.
Publication Adaptive Sampling For Sound Propagation
Publication Parametric Wave Field Coding for Precomputed Sound Propagation
2016

Project Triton ships in Gears of War 4

Project Triton ships as part of Gears of War 4 – the first instance of game acoustics provided by accurate physics-based simulation.

GDC 2017 talk on Gears of War integration
2017
Project Triton in Virtual and Mixed Reality

After years of development and refinement for use in games, Project Triton is used in the Mixed Reality experience shipped as part of the Windows 10 Fall Creator’s Update. It provides a natural acoustic experience in the virtual “cliffhouse” space, with new directional acoustics features such as sound that is obstructed by virtual objects, or heard as if coming around corners or through doorways. This experience also incorporates advances in HRTFs described in the previous timeline.

In 2018, Project Triton ships as part of Sea of Thieves, the second game to incorporate this technology. The game included custom modifications for evaluating acoustics modularly, illustrating the flexibility of the system.
Publication Parametric Directional Coding for Precomputed Sound Propagation
This SIGGRAPH paper describes improvements to Triton for encoding and rendering directional acoustic effects.
2019

Podcast: Project Triton and the Physics of Sound with Dr. Nikunj Raghuvanshi

In this podcast, Dr. Raghuvanshi wants you to hear how sound really travels – in rooms, around corners, behind walls, out doors – and he’s using computational physics to do it.
2019
Project Triton technology released as Project Acoustics

Microsoft makes Project Triton technology available to developers as Project Acoustics (opens in new tab), including Unity and Unreal plugins for easy integration into games and research prototypes.
Video Project Acoustics: Making Waves with Triton
Talk Project Acoustics | Game Developers Conference 2019
- 2019: Gears of War 5 ships, with an immersive audio experience that combines headphone rendering technologies such as Windows Sonic and Dolby Atmos with Triton’s scene-informed sound propagation.
- 2019: Borderlands 3 ships. This is the first game studio outside Microsoft to employ Project Triton.
2020
Project Acoustics incorporated into HoloLens

This milestone marks the first demonstration of physical acoustics in augmented reality.
Publication Cloud-Enabled Interactive Sound Propagation for Untethered Mixed Reality
Talk Using Project Acoustics with HoloLens 2
2020

In this webinar, Microsoft Principal Researcher Dr. Nikunj Raghuvanshi covers the ins and outs of creating practical, high-quality sound simulations. It includes an overview of the three components of sound simulation: synthesis, propagation, and spatialization, as well as a focus on Project Triton. For each, he will review the underlying physics, research techniques, practical considerations, and open research questions.

Audio analytics

2010
Audio Analytics project established

Microsoft researchers establish the Audio Analytics project, to explore research directions such as extracting non-verbal cues from human speech, detecting specific audio events and background noise, and audio search and retrieval. Potential applications include customer satisfaction analysis from customer support calls, media content analysis and retrieval, medical diagnostic aids and patient monitoring, assistive technologies for people with hearing impairments, and audio analysis for public safety.
Publication A New Speaker Identification Algorithm for Gaming Scenarios
Publication Speech Emotion Recognition Using Deep Neural Network and Extreme Learning Machine
Publication High-level Feature Representation using Recurrent Neural Network for Speech Emotion Recognition
2015
«Hey, Cortana» uses speaker identification

Microsoft releases Windows 10 with speaker identification as part of the “Hey, Cortana” wake-up feature.
Publication Learning Utterance-level Representations for Speech Emotion and Age/Gender Recognition Using Deep Neural Networks
Publication A Cross-modal Audio Search Engine based on Joint Audio-Text Embeddings
Publication Supervised Deep Hashing for Efficient Audio Event Retrieval

Lire la suite

avril 3, 2024

Domaines de recherche

Groupes de recherche

Audio and Acoustics Research Group

Research collections

Audio and Acoustics Research at Microsoft

Speech recognition and natural user interfaces

Microsoft researchers establish the Sound Capture and Speech Enhancement project

Ford releases SYNC

Windows support for microphone arrays

Hands-free control in Kinect

Microsoft releases HoloLens

Researchers begin exploring neural networks for speech enhancement

Microsoft releases HoloLens 2

Speech enhancement incorporated into Microsoft Teams

What’s Your Story: Lex Story

Audio for collaboration and productivity

First audio-related paper published

Seeing the sound

Progress in audio detection and classification

Project RingCam established

Microsoft RoundTable ships with speaker detection technology

Capturing and reproducing sound

Researchers begin experimenting with microphone arrays

USB microphone array prototypes

An anechoic chamber in Building 99

Anechoic chamber retrofitted to measure sound in 3D

Progress in microphone arrays

A new approach to gesture recognition

Live 360 audio and video streaming

Project Denmark established

Spatial audio

New directions for spatial audio

Virtual surround sound in Windows 10

Personalized audio rendering in HoloLens

Microsoft releases the Windows Mixed Reality platform

A map delivered in 3D sound

Podcast: Hearing in 3D with Dr. Ivan Tashev

Acoustic simulation

Microsoft researchers establish Project Triton

Researchers begin collaboration with game studios

Project Triton ships in Gears of War 4

Project Triton in Virtual and Mixed Reality

Podcast: Project Triton and the Physics of Sound with Dr. Nikunj Raghuvanshi

Project Triton technology released as Project Acoustics

Project Acoustics incorporated into HoloLens

Audio analytics

Audio Analytics project established

«Hey, Cortana» uses speaker identification

Lire la suite

Research Focus: Week of April 1, 2024

What’s Your Story: Ivan Tashev

Thinking beyond audio: Augmenting headphones for everyday digital interactions

Microsoft Soundscape – New Horizons with a Community-Driven Approach

Domaines de recherche

Groupes de recherche