Interspeech 2018 Low Resource Automatic Speech Recognition Challenge for Indian Languages

  • Brij Mohan Lal Srivastava ,
  • ,
  • ,
  • Rupesh Kumar Mehta ,
  • Krishna Doss Mohan ,
  • Pallavi Matani ,
  • Sandeepkumar Satpal ,
  • Kalika Bali ,
  • Radhakrishnan Srikanth ,
  • Niranjan Nayak

SLTU |

India has more than 1500 languages, with 30 of them spoken by more than one million native speakers. Most of them are low-resource and could greatly benefit from speech and language technologies. Building speech recognition support for these low-resource languages requires innovation in handling constraints on data size, while also exploiting the unique properties and similarities among Indian languages. With this goal, we organized a low-resource Automatic Speech Recognition challenge for Indian languages as part of Interspeech 2018. We released 50 hours of speech data with transcriptions for Tamil, Telugu and Gujarati, amounting to a total of 150 hours. Participants were required to only use the data we released for the challenge to preserve the low-resource setting, however, they were not restricted to work on any particular aspect of the speech recognizer. We received 109 submissions from 18 research groups and evaluated the systems in terms of Word Error Rate on a blind test set. In this paper we summarize the data, approaches and results of the challenge.