Building Natural Language Interfaces to Web APIs
- Yu Su ,
- Ahmed Awadallah ,
- Madian Khabsa ,
- Patrick Pantel ,
- Michael Gamon
In Proceedings of the 26th ACM international conference on Information and knowledge management (CIKM '17) |
Published by ACM
As the Web evolves towards a service-oriented architecture, application program interfaces (APIs) are becoming an increasingly important way to provide access to data, services, and devices. We study the problem of natural language interface to APIs (NL2APIs), with a focus on web APIs for web services. Such NL2APIs have many potential benefits, for example, facilitating the integration of web services into virtual assistants.
We propose the first end-to-end framework to build an NL2API for a given web API. A key challenge is to collect training data, i.e., NL command-API call pairs, from which an NL2API can learn the semantic mapping from ambiguous, informal NL commands to formal API calls. We propose a novel approach to collect training data for NL2API via crowdsourcing, where crowd workers are employed to generate diversified NL commands. We optimize the crowdsourcing process to further reduce the cost. More specifically, we propose a novel hierarchical probabilistic model for the crowdsourcing process, which guides us to allocate budget to those API calls that have a high value for training NL2APIs. We apply our framework to real-world APIs, and show that it can collect high-quality training data at a low cost, and build NL2APIs with good performance from scratch. We also show that our modeling of the crowdsourcing process can improve its effectiveness, such that the training data collected via our approach leads to better performance of NL2APIs than a strong baseline.
Publication Downloads
Natural Language Interfaces to Web APIs Dataset
April 26, 2019
The NL2API dataset includes the web APIs call from the Microsoft Graph API suite, which are respectively used to search a user’s emails and calendar events. Each data points include the API call, its canonical form and its associated natural utterances, as well as the API properties.