Evaluating and Complementing Vision-to-Language Technology for People who are Blind with Conversational Crowdsourcing

Elliot Salisbury; Ece Kamar; Meredith Ringel Morris

Evaluating and Complementing Vision-to-Language Technology for People who are Blind with Conversational Crowdsourcing

Elliot Salisbury ,
Ece Kamar ,
Meredith Ringel Morris

Proceedings of IJCAI 2018 | July 2018

We study how real-time crowdsourcing can be used both for evaluating the value provided by existing automated approaches and for enabling workflows that provide scalable and useful alt text to blind users. We show that the shortcomings of existing AI image captioning systems frequently hinder a user’s understanding of an image they cannot see to a degree that even clarifying conversations with sighted assistants cannot correct. Based on analysis of clarifying conversations collected from our studies, we design experiences that can effectively assist users in a scalable way without the need for real-time interaction. Our results provide lessons and guidelines that the designers of future AI captioning systems can use to improve labeling of social media imagery for blind users.

Combining Human and Machine Intelligence to Describe Images to People with Vision Impairments

This talk was presented as part of the CVPR 2020 VizWiz Grand Challenge Workshop. More information about the workshop can be found at https://vizwiz.org/workshops/2020-workshop/.