Convolutional-Recurrent Neural Networks for Speech Enhancement

Han Zhao; Shuayb Zarar; Ivan Tashev; Chin-Hui Lee

Convolutional-Recurrent Neural Networks for Speech Enhancement

Han Zhao ,
Shuayb Zarar ,
Ivan Tashev ,
Chin-Hui Lee

IEEE Int. Conf. Acoustics Speech and Signal Processing (ICASSP) | April 2018

Download BibTex

We propose a novel, end-to-end model based on convolutional and recurrent neural networks for speech enhancement. Our model is purely data-driven: it does not make any assumptions about the type or the stationarity of the noise. In contrast to existing methods that use multilayer perceptrons (MLPs), we employ both convolutional and recurrent neural network architectures. Thus, our approach allows us to exploit local structures in both the spatial and temporal domains. By incorporating prior knowledge of speech signals into the design of model structures, we build a model that is more data-efficient and achieves better generalization on both seen and unseen noise. Based on experiments with synthetic data, we demonstrate that our model outperforms existing methods, improving PESQ by up to 0.6 on seen noise and 0.64 on unseen noise.

© IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.