The Audio Epitome: A New Representation for Modeling and Classifying Auditory Phenomena
This paper presents a novel representation for auditory environments that can be used for classifying events of interest, such as speech, cars, etc., and potentially used to classify the environments themselves. We propose a novel discriminative framework that is based on the audio epitome, an audio extension of the image representation developed by Jojic et al. [3]. We also develop an informative patch sampling procedure to train the epitomes. This procedure reduces the computational complexity and increases the quality of the epitome. For classification, the training data is used to learn distributions over the epitomes to model the different classes; the distributions for new inputs are then compared to these models. On a task of distinguishing between 4 auditory classes in the context of environmental sounds (car, speech, birds, utensils), our method outperforms the conventional approaches of nearest neighbor and mixture of Gaussians on three out of the four classes.
© 2004 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.