Final intern talk: Improving Frechet Audio Distance for Generative Music Evaluation

As generative music models become more powerful and popular, there is a growing need for robust objective metrics of music quality that correlates with human perception. The Frechet Audio Distance (FAD) is a commonly used metric for this purpose. However, its performance may be hampered by issues including sample size bias, limitations of the underlying audio embeddings, and the use of low-quality reference sets. We propose reducing sample size bias by extrapolating unbiased scores as the sample size approaches infinity. A comparison of various audio embeddings reveals that some are better suited for deriving FAD scores that capture aspects of musical or acoustic quality. Finally, our experiments underscore the importance of choosing a diverse and high-quality reference dataset for FAD calculation. Listening test results indicate that unbiased FAD scores calculated using suitable embeddings and reference music improves correlation with human ratings of musical and acoustic quality.

Paper: https://arxiv.org/abs/2311.01616
Code: https://github.com/microsoft/fadtk

Date: