Deep Noise Suppression Challenge – INTERSPEECH 2021

Region: Global

Program dates: January 2020 – March 2021

The Deep Noise Suppression (DNS) challenge is designed to foster innovation in the area of noise suppression to achieve superior perceptual speech quality. We recently organized a DNS challenge special session at INTERSPEECH 2020 and ICASSP 2020. We open-sourced training and test datasets for the wideband scenario. We also open-sourced a subjective evaluation framework based on ITU-T standard P.808, which was used to evaluate challenge submissions. Many researchers from academia and industry made significant contributions to push the field forward, yet even the best noise suppressor was far from achieving superior speech quality in challenging scenarios. In this version of the challenge organized at INTERSPEECH 2021, we are expanding both our training and test datasets to accommodate full band scenarios. The two tracks in this challenge will focus on real-time denoising for (i) wide band, and (ii) full band scenarios. We are also making available a reliable non-intrusive objective speech quality metric for wide band called DNSMOS for the participants to use during their development phase.

Challenge description (PDF)

We will have two tracks in this challenge:

  • Track 1: Real-Time Denoising track for wide band scenario
    The noise suppressor must take less than the stride time Ts (in ms) to process a frame of size T (in ms) on an Intel Core i5 quad-core machine clocked at 2.4 GHz or equivalent processor. For example, Ts = T/2 for 50% overlap between frames. The total algorithmic latency allowed including the frame size T, stride time Ts, and any look ahead must be less than or equal to 40ms. For example, for a real-time system that receives 20ms audio chunks, if you use a frame length of 20ms with a stride of 10ms resulting in an algorithmic latency of 30ms, then you satisfy the latency requirements. If you use a frame of size 32ms with a stride of 16ms resulting in an algorithmic latency of 48ms, then your method does not satisfy the latency requirements as the total algorithmic latency exceeds 40ms. If your frame size plus stride T1=T+Ts is less than 40ms, then you can use up to (40-T1) ms future information.
  • Track 2: Real-Time Denoising track for full band scenario
    Satisfy Track 1 requirements but at 48 kHz.

Participants are forbidden from using the blind test set to retrain or tweak their models. Participants must submit results only if they intend to submit a paper to INTERSPEECH 2021. Failing to adhere to these rules will lead to disqualification from the challenge.

Registration

Please send an email to [email protected] stating that you are interested to participate in the challenge. Please include the following details in your email:

  • List of participants
  • Affiliation of each participant
  • Email ID of each participant

Also, please create a new submission at https://cmt3.research.microsoft.com/3rdDNSChallenge and fill out all the details. This will help us to easily send out any announcements.

Contact us: If you have questions about this program, email us at [email protected].