Recent Efforts Towards Efficient And Scalable Neural Waveform Coding

Acoustic signal compression techniques, converting the floating-point waveform into the bitstream representation, serve a cornerstone in the current data storage and telecommunication infrastructure. The rise of data-driven approaches for acoustic coding systems brings in not only potentials but also challenges, among which the model complexity is a major concern: on the one hand, this general-purpose computational paradigm features the performance superiority; on the other hand, most codecs are deployed on low power devices which barely afford the overwhelming computational overhead. In this talk, I will introduce several of our recent efforts towards a better trade-off between performance and efficiency for neural speech/audio coding. I will present on cascaded cross-module residual learning to conduct multistage quantization in deep learning techniques; in addition, a collaborative quantization scheme will be talked about to simultaneously binarize linear predictive coefficients and the corresponding residuals. If time permits, a novel perceptually salient objective function with a psychoacoustical calibration will also be discussed.

Speaker Bios

Kai Zhen is a Ph.D. candidate (ABD), advised by Prof. Minje Kim, in Computer Science and Cognitive Science at Indiana University. He has been working on efficient and scalable neural waveform coding systems. He had two machine learning and relevance internships at LinkedIn in 2018 and 2019, trailed by an internship at Amazon Alexa in 2020.

Date:
Haut-parleurs:
Kai Zhen
Affiliation:
Indiana University

Taille: Microsoft Research Talks