Offline Voice Activity Detector Using Speech Supergaussianity
Information Theory and Applications Workshop |
Published by University of California - San Diego
Voice Activity Detectors (VAD) play important role in audio processing algorithms. Most of the algorithms are designed to be causal, i.e. to work in real time using only current and past audio samples. Off-line processing, when we have access to the entire voice utterance, allows using different type of approaches for increased precision. In this paper we propose an algorithm for off-line VAD based on the different probability density functions (PDFs) of the speech and noise. While a Gaussian distribution is a very good model for noise, the speech PDF is peakier. The proposed VAD algorithm works in frequency domain and estimates the speech signal presence probability for each frequency bin in each audio frame, the speech presence probability for each frame and also provides a binary decision per bin and frame. Provides improved precision compared to the streaming real-time VAD algorithms.