DNN-based Causal Voice Activity Detector
- Ivan Tashev ,
- Seyedmahdad Mirsamadi
Information Theory and Applications Workshop |
Published by University of California - San Diego
Voice Activity Detectors (VAD) are important components in audio processing algorithms. In general, VADs are two way classifiers, flagging the audio frames where we have voice activity. Most of them are based on the signal energy and build statistical models of the noise background and the speech signal. In the process of derivation, we are limited to simplified statistical models and this limits the accuracy of the classification. Using more precise, but also more complex, statistical models makes the analytical derivation of the solution practically impossible. In this paper, we propose using deep neural network (DNN) to learn the relationship between the noisy speech features and the correct VAD decision. In most of the cases we need a causal algorithm, i.e. working in real time and using only current and past audio samples. This is why we use audio segments that consist only of current and previous audio frames, thus making possible real-time implementations. The proposed algorithm and DNN structure exceeds the classic, statistical model based VAD for both seen and unseen noises.