A multichannel feature compensation approach for robust ASR in noisy and reverberant environments

The REVERB Workshop |

In this paper we propose a multichannel feature compensation approach for automatic speech recognition in reverberant and noisy environments. The proposed technique propagates the posterior of the clean signal estimated by a multichannel Wiener filter in short-time Fourier transform (STFT) domain into Mel-frequency cepstrum coefficients (MFCC) domain. The multichannel Wiener filter reduces both reverberation and additive noise. Furthermore, we approximate the propagation of the prior distributions of speech and interference through the inverse STFT and the STFT with different time-frequency resolutions. This allows us to derive a multichannel minimum mean square error MFCC estimator with an STFT resolution that is different from the resolution in the speech enhancement stage. The proposed approach is able to outperform a multichannel short-time spectral amplitude estimation approach on both the clean training and multi-condition training ASR tasks of the REVERB challenge.