Temporal Supervised Learning for Inferring a Dialog Policy from Example Conversations
- Lihong Li ,
- He He ,
- Jason Williams
Proceedings IEEE Spoken Language Technology Workshop (SLT) |
Published by IEEE - Institute of Electrical and Electronics Engineers
To appear in the 2014 IEEE Spoken Language Technology Workshop.
This paper tackles the problem of learning a dialog policy from example dialogs – for example, from Wizard-of-Oz style dialogs, where an expert (person) plays the role of the system. Learning in this setting is challenging because dialog is a temporal process in which actions affect the future course of the conversation – i.e., dialog requires planning. Past work solved this problem with either conventional supervised learning or reinforcement learning. Reinforcement learning provides a principled approach to planning, but requires more resources than a fixed corpus of examples, such as a dialog simulator or a reward function. Conventional supervised learning, by contrast, operates directly from example dialogs but does not take proper account of planning. We introduce a new algorithm called Temporal Supervised Learning which learns directly from example dialogs, while also taking proper account of planning. The key idea is to choose the next dialog action to maximize the expected discounted accuracy until the end of the dialog. On a dialog testbed in the calendar domain, in simulation, we show that a dialog manager trained with temporal supervised learning substantially outperforms a baseline trained using conventional supervised learning.
© IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.