Project VALL-E A neural codec language model for speech synthesis We introduce a language modeling approach for text-to-speech synthesis (TTS). Specifically, we train a neural codec language model (called VALL-E) using discrete codes derived from an off-the-shelf…
Publication Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization Dongmei Wang, Xiong Xiao, Naoyuki Kanda, Takuya Yoshioka, Jian Wu ICASSP 2023 | June 2023
Publication DasFormer: Deep Alternating Spectrogram Transformer for Multi/Single-Channel Speech Separation Shuo Wang, Xiangyu Kong, Xiulian Peng, Hesam Movassagh, Vinod Prakash, Yan Lu ICASSP 2023 | June 2023
Publication Real-Time Audio-Visual End-To-End Speech Enhancement Zirun Zhu, Hemin Yang, Min Tang, Ziyi Yang, Sefik Emre Eskimez, Huaming Wang 2023 IEEE International Conference on Acoustics, Speech and Signal Processing | June 2023
Publication Speech MOS multi-task learning and rater bias correction Haleh Akrami, Hannes Gamper IEEE ICASSP | June 2023
Publication MuseCoco: Generating Symbolic Music from Text Peiling Lu, Xin Xu, Chenfei Kang, Botao Yu, Chengyi Xing, Xu Tan, Jiang Bian June 2023
Publication Leveraging Pretrained Representations with Task-related Keywords for Alzheimer’s Disease Detection Jinchao Li, Kaitao Song, Junan Li, Bo Zheng, Dongsheng Li, Xixin Wu, Xunying Liu, Helen Meng ICASSP 2023 | June 2023
Publication BEATs: Audio Pre-Training with Acoustic Tokenizers Sanyuan Chen, Yu Wu, Chengyi Wang, Shujie Liu, Daniel Tompkins, Zhuo Chen, Furu Wei ICML 2023 | June 2023
Publication Towards Real-Time Single-Channel Speech Separation in Noisy and Reverberant Environments Julian Neri, Sebastian Braun 2023 International Conference on Acoustics, Speech, and Signal Processing | June 2023 Project
Publication A Hierarchical Regression Chain Framework for Affective Vocal Burst Recognition Jinchao Li, Xixin Wu, Kaitao Song, Dongsheng Li, Xunying Liu, Helen Meng ICASSP 2023 | June 2023