Generation Probabilities are Not Enough: Improving Error Highlighting for AI Code Suggestions
- Helena Vasconcelos ,
- Gagan Bansal ,
- Adam Fourney ,
- Q. Vera Liao ,
- Jennifer Wortman Vaughan
NeurIPS Workshop on Human-Centered AI |
Large-scale generative models are increasingly being used in tooling applications. As one prominent example, code generation models recommend code completions within an IDE to help programmers author software. However, since these models are imperfect, their erroneous recommendations can introduce bugs or even security vulnerabilities into a code base if not overridden by a human user. In order to override such errors, users must first detect them. One method of assisting this detection has been highlighting tokens with low generation probabilities. We also propose another method, predicting the tokens people are likely to edit in a generation. Through a mixed-methods, pre-registered study with N = 30 participants, we find that the edit model highlighting strategy results in significantly faster task completion time, significantly more localized edits, and was strongly preferred by participants.