Collecting and Characterizing Natural Language Utterances for Specifying Data Visualizations

Arjun Srinivasan; Nikhila Nyapathy; Bongshin Lee; Steven Drucker; John Stasko

Collecting and Characterizing Natural Language Utterances for Specifying Data Visualizations

Arjun Srinivasan ,
Nikhila Nyapathy ,
Bongshin Lee ,
Steven Drucker ,
John Stasko

Proceedings of the ACM Conference on Human Factors in Computing Systems | May 2021

Natural language interfaces (NLIs) for data visualization are becoming increasingly popular both in academic research and in commercial software. Yet, there is a lack of empirical understanding of how people specify visualizations through natural language. We conducted an online study (N = 102), showing participants a series of visualizations and asking them to provide utterances they would pose to generate the displayed charts. From the responses, we curated a dataset of 893 utterances and characterized the utterances according to (1) their phrasing (e.g., commands, queries, questions) and (2) the information they contained (e.g., chart types, data aggregations). To help guide future research and development, we contribute this utterance dataset and discuss its applications toward the creation and benchmarking of NLIs for visualization.