The FATE Landscape of Sign Language AI Datasets: An Interdisciplinary Perspective

  • ,
  • Naomi Caselli ,
  • Julie A. Hochgesang ,
  • Matt Huenerfauth ,
  • Leah Katz-Hernandez ,
  • ,
  • Raja Kushalnagar ,
  • Christian Vogler ,
  • Richard E. Ladner

ACM Transactions on Accessible Computing | , Vol 14(2)

Publication

Sign language datasets are essential to developing many sign language technologies. In particular, datasets are required for training artificial intelligence (AI) and machine learning (ML) systems. Though the idea of using AI/ML for sign languages is not new, technology has now advanced to a point where developing such sign language technologies is becoming increasingly tractable. This critical juncture provides an opportunity to be thoughtful about an array of Fairness, Accountability, Transparency, and Ethics (FATE) considerations. Sign language datasets typically contain recordings of people signing, which is highly personal. The rights and responsibilities of the parties involved in data collection and storage are also complex, and involve individual data contributors, data collectors or owners, and data users who may interact through a variety of exchange and access mechanisms. Deaf community members (and signers more generally) are also central stakeholders in any end applications of sign language data. The centrality of sign language to deaf culture identity, coupled with a history of oppression, makes usage by technologists particularly sensitive. This piece presents many of these issues that characterize working with sign language AI datasets, based on the authors’ experiences living, working, and studying in this space.