Statistical Modeling of the Speech Signal

Ivan Tashev; Alex Acero

Statistical Modeling of the Speech Signal

Ivan Tashev ,
Alex Acero

International Workshop on Acoustic, Echo, and Noise Control (IWAENC), Tel Aviv, Israel | September 2010

The Gaussian distribution is the most commonly used statistical model of the speech signal. In this paper we propose more general statistical model for the distributions of the real and imaginary parts of the speech signal DFT coefficients and their magnitudes. Based on experimental measurements with the TIMIT database we have shown that the Generalized Gaussian Distribution holds well across frequency and audio frame size. A Weibull distribution is proposed to model the statistical behavior of the speech signal amplitude in the frequency domain. Estimation of the distribution parameters from experimental measurements corresponds well to the distribution of the real and imaginary parts. We propose and evaluate several statistical models of various complexities. Overall these statistical models fit the actual measurements with a Jensen-Shannon divergence below 0.0012 for real and imaginary parts and below 0.003 for magnitudes. The results presented in this paper are applicable for improving speech processing algorithms based on statistical signal processing.