Audio and acoustics

Publication

Natural Language Supervision For General-Purpose Audio Representations

Benjamin Elizalde, Soham Deshmukh, Huaming Wang

2024 International Conference on Acoustics, Speech, and Signal Processing | April 2024

Publication

WavLLM: Towards Robust and Adaptive Speech Large Language Model

Shujie Hu, Long Zhou, Shujie Liu, Sanyuan Chen, Hongkun Hao, Jing Pan, Xunying Liu, Jinyu Li, Sunit Sivasankaran, Linquan Liu, Furu Wei

March 2024

Project

CoVoMix

Advancing Zero-shot Speech Generation for Human-like Multi-talker Conversation We introduce CoVoMix: Conversational Voice Mixture Generation, a novel model for zero-shot, human-like, multi-speaker, multi-round dialogue speech generation. In addition, we devise a comprehensive set of metrics…

Podcast

What’s Your Story: Ivan Tashev

February 1, 2024

Partner Software Architect Ivan Tashev talks about applying his expertise in audio signal processing to the design and study of audio components for Microsoft products such as Kinect and shares how a focus on what…

Microsoft Research Blog

Research at Microsoft 2023: A year of groundbreaking AI advances and discoveries

December 22, 2023

AI saw unparalleled growth in 2023, reaching millions daily. This progress owes much to the extensive work of Microsoft researchers and collaborators. In this review, learn about the advances in 2023, which set the stage…

Video

Synchronized Audio-Visual Generation with a Joint Generative Diffusion Model and Contrastive Loss

November 6, 2023 | Ruihan Yang

The rapid development of deep learning techniques has led to significant advancements in the fields of multimedia generation and synthesis. However, generating coherent and temporally aligned audio and video content remains a challenging task due…

46:11

Video

Binaural spatial audio positioning in video calls

October 4, 2023 | Jeremy Hyrkas

Spatially separating voices plays a crucial role in speech intelligibility, speaker identification and cognitive load in conversations. Voices are naturally separated in in-person conversations, but in most video conferencing software voices are mixed down to…

01:03:57

Publication

Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser

Yung-Hsuan Lai, Yen-Chun Chen, Yu-Chiang Frank Wang

NeurIPS 2023 | October 2023

Publication

Spatio-Temporal Windowing for Encoding Perceptually Salient Early Reflections in Parametric Spatial Audio Rendering

Tobias Jüterbock, Fabian Brinkmann, Hannes Gamper, Nikunj Raghuvanshi, Stefan Weinzierl

Journal of the Audio Engineering Society | October 2023, Vol 71(10)

Career Opportunity

Undergraduate Research Internship – Computing

Posted: September 28, 2023

Location: United States

Research Area(s): Algorithms, Artificial intelligence, Audio and Acoustics, Computer vision, Data platforms and analytics, Ecology and environment, Economics, Graphics and multimedia, Hardware and devices, Human language technologies, Human-computer interaction, Mathematics, Medical, health and genomics, Programming languages and software engineering, Search and information retrieval, Security, privacy, and cryptography, Social sciences, Systems and networking

This program is for candidates who are passionate about technology and offer diverse perspectives. We don’t just value differences, we seek them out. Interns put inquiry and theory into practice. Alongside doctoral interns, and some…