Research Collection: The Unseen History of Audio and Acoustics Research at Microsoft

Published

Audio and Acoustics Research at Microsoft

Getting the sound right is a crucial ingredient in natural user interfaces, immersive gaming, realistic virtual and mixed reality, and ubiquitous computing. Audio also plays an important role in assistive technologies for people who are blind or have low vision, and speech recognition and processing can help support those who are deaf or hard of hearing. Although computers have been capable of playing and processing high-fidelity audio for many decades, there are many frontiers left to explore in computational recognition, analysis and rendering of sound for speech or immersive sound fields.

audio and acoustics: woman and man setting up a dummy in anachoic chamber

Audio has been a key research area since Microsoft Research was founded in 1991 – in its first year, researchers used audio data as well as other cues to explore automatic summarization of audiovisual presentations. Over the years, there have been steady and significant research advances in speech recognition, natural user interfaces, audio as a tool for collaboration and productivity, capturing and reproducing sound, spatial audio, acoustic simulation and audio analytics.

Many of these advances have shipped in Microsoft products and services like Windows 10, Kinect, HoloLens and Teams, as well as Ford’s SYNC in-car infotainment system, Polycom’s videoconferencing devices, and major game titles such as Gears of War, Sea of Thieves and Borderlands 3. Still more are working their way into future products and services, and into the hands of developers.

Use the timelines below to explore several threads of audio and acoustics research as they evolved from theories and experiments to real-world applications.


Speech recognition and natural user interfaces

  1. 2002

    Microsoft researchers establish the Sound Capture and Speech Enhancement project

    The Sound Capture and Speech Enhancement project begins to explore areas such as acoustic echo reduction, microphone array processing and noise reduction.

  2. 2007

    Ford releases SYNC

    Ford releases the first version of its SYNC in-car infotainment system, with a speech enhancement audio pipeline first designed by Microsoft researchers.

    audio and acoustics: man standing in front of Experience Ford SYNC kiosk
  3. 2007

    Windows support for microphone arrays

    Microsoft releases Windows Vista, including support for four preselected microphone array geometries and standardized support for USB microphone arrays. Later, Windows 10 is updated to include support for microphone arrays with arbitrary geometry.

  4. 2010
  5. 2016

    Microsoft releases HoloLens

    Microsoft releases HoloLens, which contains a four-element microphone array and a sophisticated sound capture and speech enhancement system for capturing the voice of the wearer and the ambient sound environment.

    image of hands holding a Hololens device
  6. 2017

    Researchers begin exploring neural networks for speech enhancement

    In 2017, Microsoft researchers establish the Neural Networks-Based Speech Enhancement project, which aims for more accurate and reliable speech processing, particularly on mobile, wearable, smart home and IoT devices – which, unlike previous devices, present new challenges such as noisier background environments, greater speaker-microphone distances, and limited edge processing abilities.

  7. 2019

    Microsoft releases HoloLens 2

    The device contains a five-element microphone array and sophisticated sound capture and speech enhancement system for capturing the voice of the wearer as well as the ambient sound environment. Researchers explored key components of its speech enhancement technology earlier in the year.

  8. 2020

    Speech enhancement incorporated into Microsoft Teams

    Microsoft CEO Satya Nadella announces (opens in new tab) that new improvements to Microsoft Teams will include a neural network-based speech enhancement algorithm.

GigaPath: Whole-Slide Foundation Model for Digital Pathology

Digital pathology helps decode tumor microenvironments for precision immunotherapy. In joint work with Providence and UW, we’re sharing Prov-GigaPath, the first whole-slide pathology foundation model, for advancing clinical research.


Audio for collaboration and productivity

  1. 1991

    Microsoft researchers publish their first audio-related paper, on the automatic summarization of multimedia presentations.

    audio and acoustics: 1991 software testing window UI
    The slides, shown on right, are synchronized with summary-segment transitions derived from the presentation at left.

  2. 1996

    Seeing the sound

    audio and acoustics: man standing in front on a large VR screen with hands up

    In 1996, Microsoft researchers explore ways to use vision data to capture and render sound in interactive environments.

  3. 1999

    Progress in audio detection and classification

  4. 2001

    Project RingCam established

    Microsoft researchers establish Project RingCam, to explore 360-degree videoconferencing.

    audio and acoustics: project RingCam video conference screen
  5. 2007

    Microsoft RoundTable ships with speaker detection technology

    photo of the Microsoft RoundTable video conferencing device

    Speaker detection technology developed by Microsoft researchers ships as part of the Microsoft Roundtable conferencing system.

    The technology is later sold to Polycom and released as the Polycom CX5000.


Capturing and reproducing sound

  1. 1998

    Researchers begin experimenting with microphone arrays

    Microsoft researchers build their first microphone array, using an Erector set.

    audio and acoustics: prototype of microphone array
    This is one of the first prototypes of microphone arrays designed in the Signal Processing group by Rico Malvar and Dinei Fiorencio in 1998.
  2. 2005

    USB microphone array prototypes

    Microsoft researchers establish the Audio Devices project, and build and evaluate two USB microphone array prototypes: a four element linear array and an eight element circular array.

  3. 2007

    An anechoic chamber in Building 99

    Microsoft Research Redmond moves into its new home in Building 99. The building includes the company’s first anechoic chamber.

    view of inside the anechoic chamber

    Key publications from 2007:

  4. 2009

    Anechoic chamber retrofitted to measure sound in 3D

    The anechoic chamber in Building 99 is retrofitted to automatically measure 3D directivity and radiation patterns, including human spatial hearing. It uses a 3D scanner with sub-millimeter accuracy to measure the head and torso. Among other things, this enables the advancement of head-related transfer functions (HRTFs), which can enable more realistic-sounding spatial audio.

    The Microsoft Research anechoic chamber set for measuring human spatial hearing.
    The Microsoft Research anechoic chamber set for measuring human spatial hearing.
    Microsoft Campus Tours – Microsoft Research Part 1 – The Anechoic Chamber
  5. 2012

    Progress in microphone arrays

    Microsoft researchers build a spherical 16 channel microphone array and a cylindrical 16 channel microphone array to study sound field decomposition using spherical and cylindrical functions. In 2016, they build a 64-channel spherical microphone array.

  6. 2017

    A new approach to gesture recognition

    Ultrasound-based Gesture Recognition – This paper introduces a new approach to gesture recognition using ultrasound waves, which uses significantly less power than optical systems.

    audio and acoustics: figures showing ultrasound based gesture recognition setup
    Figure 1: Left: Hardware set-up and close-up of the ultrasonic piezoelectric transducer at the center and an 8-element microphone array around it in a circular configuration.

    Figure 2: Right: Block diagram of the proposed approach.
  7. 2018

    Live 360 audio and video streaming

  8. 2019

    Project Denmark established

    Microsoft researchers establish Project Denmark, which aims to achieve high-quality capture of meeting conversations using virtual microphone arrays composed of ordinary consumer devices such as mobile phones and laptops.


Spatial audio

  1. 2012

    New directions for spatial audio

    Microsoft researchers begin exploring new approaches to head-related transfer functions (HRTFs), which represent the acoustic transfer function from a sound source at a given location to the ear drums of a human. A potential consequence of this work is more realistic spatial audio that is tuned to the shape of the listener’s head and torso.

  2. 2015

    Virtual surround sound in Windows 10

    Microsoft releases Windows 10 with support for virtual surround sound, marketed as Windows Sonic. This spatial audio rendering system is later released as part of HoloLens.

  3. 2016

    Personalized audio rendering in HoloLens

    Microsoft releases HoloLens. The device features an audio rendering system with on-the-fly personalization of the wearer’s spatial hearing.

  4. 2016

    Microsoft releases the Windows Mixed Reality platform

    Windows 10 includes support for virtual and mixed reality headsets manufactured by other companies. The platform contains an extended and improved version of the spatial audio engine.

  5. 2017

    A map delivered in 3D sound

    Man holding a smart phone, standing next to his guide dog

    Microsoft releases Soundscape (in collaboration with Guide Dogs UK) – a helper app for visually impaired people, which includes a spatial audio rendering system. Read about the research behind the product.

  6. 2018

    Podcast: Hearing in 3D with Dr. Ivan Tashev

    Ivan Tashev podcast

    In this podcast, Dr. Tashev provides an overview of the quest for better sound processing and speech enhancement, describes the latest innovations in 3D audio, and explains why the research behind audio processing technology is, thanks to variations in human perception, equal parts science, art and craft.

    Key publications from 2018:


Acoustic simulation

  1. 2010

    Microsoft researchers establish Project Triton

    Prior to 2010, a key challenge in interactive audio had been the fast modeling of wave effects in complex game scenes: smooth sound obstruction around doorways, or dynamic reverberation, responsive to both source and listener motion. In the paper below, Microsoft researchers introduced the idea of pre-computing physically accurate wave simulations, and showed that it was a viable path forward for interactive audio and games.

    Project Triton explores a physics-based approach to modeling virtual environments, for more realistic in-game audio.

  2. 2012

    Researchers begin collaboration with game studios

    Microsoft researchers begin collaborating with The Coalition Studio to incorporate this acoustic simulation work into Gears of War, transitioning from exploratory research to a targeted redesign focused on performance and flexibility.

    • 2013: The first working prototype of Project Triton is demonstrated internally.
    • 2014: This paper describes the core design of Project Triton, combining perceptual coding, spatial compression and parametric rendering. The design solves the problem of system resource usage, and integrates easily into existing audio tools. Later work has built on this core design, with various improvements.
    • 2015: A Microsoft Research summer intern researches a novel adaptive sampling approach to resolve a key robustness issue in Project Triton.
  3. 2016

    Project Triton ships in Gears of War 4

    Project Triton ships as part of Gears of War 4 – the first instance of game acoustics provided by accurate physics-based simulation.

    GDC 2017 talk on Gears of War integration
  4. 2017

    Project Triton in Virtual and Mixed Reality

    Screenshot of the Mixed Reality experience in Windows 10

    After years of development and refinement for use in games, Project Triton is used in the Mixed Reality experience shipped as part of the Windows 10 Fall Creator’s Update. It provides a natural acoustic experience in the virtual “cliffhouse” space, with new directional acoustics features such as sound that is obstructed by virtual objects, or heard as if coming around corners or through doorways. This experience also incorporates advances in HRTFs described in the previous timeline.

    In 2018, Project Triton ships as part of Sea of Thieves, the second game to incorporate this technology. The game included custom modifications for evaluating acoustics modularly, illustrating the flexibility of the system.

  5. 2019

    Podcast: Project Triton and the Physics of Sound with Dr. Nikunj Raghuvanshi

    Nikunj Raghuvanshi

    In this podcast, Dr. Raghuvanshi wants you to hear how sound really travels – in rooms, around corners, behind walls, out doors – and he’s using computational physics to do it.
  6. 2019

    Project Triton technology released as Project Acoustics

    Microsoft makes Project Triton technology available to developers as Project Acoustics (opens in new tab), including Unity and Unreal plugins for easy integration into games and research prototypes.

    • 2019: Gears of War 5 ships, with an immersive audio experience that combines headphone rendering technologies such as Windows Sonic and Dolby Atmos with Triton’s scene-informed sound propagation.
    • 2019: Borderlands 3 ships. This is the first game studio outside Microsoft to employ Project Triton.
    Screenshot of Borderlands 3 game

  7. 2020

    Project Acoustics incorporated into HoloLens

    This milestone marks the first demonstration of physical acoustics in augmented reality.

  8. 2020
    Webinar with Nikunj Raghuvanshi

    In this webinar, Microsoft Principal Researcher Dr. Nikunj Raghuvanshi covers the ins and outs of creating practical, high-quality sound simulations. It includes an overview of the three components of sound simulation: synthesis, propagation, and spatialization, as well as a focus on Project Triton. For each, he will review the underlying physics, research techniques, practical considerations, and open research questions.


Audio analytics

  1. 2010

    Audio Analytics project established

    Microsoft researchers establish the Audio Analytics project, to explore research directions such as extracting non-verbal cues from human speech, detecting specific audio events and background noise, and audio search and retrieval. Potential applications include customer satisfaction analysis from customer support calls, media content analysis and retrieval, medical diagnostic aids and patient monitoring, assistive technologies for people with hearing impairments, and audio analysis for public safety.

  2. 2015

    “Hey, Cortana” uses speaker identification

    Microsoft releases Windows 10 with speaker identification as part of the “Hey, Cortana” wake-up feature.

Back to top >

Continue reading

See all blog posts