A Dataset and Baselines for Multilingual Reply Suggestion

Mozhi Zhang; Wei Wang; Budhaditya Deb; Guoqing Zheng; Milad Shokouhi; Ahmed Awadallah

A Dataset and Baselines for Multilingual Reply Suggestion

Mozhi Zhang ,
Wei Wang ,
Budhaditya Deb ,
Guoqing Zheng ,
Milad Shokouhi ,
Ahmed Awadallah

ACL-IJCNLP 2021 | July 2021

Published by ACL 2021

Reply suggestion models help users process emails and chats faster. Previous work only studies English reply suggestion. Instead, we present MRS, a multilingual reply suggestion dataset with ten languages. MRS can be used to compare two families of models: 1) retrieval models that select the reply from a fixed set and 2) generation models that produce the reply from scratch. Therefore, MRS complements existing cross-lingual generalization benchmarks that focus on classification and sequence labeling tasks. We build a generation model and a retrieval model as baselines for MRS. The two models have different strengths in the monolingual setting, and they require different strategies to generalize across languages. MRS is publicly available at https://github.com/zhangmozhi/mrs (opens in new tab).

Téléchargements de publications

Baselines for Multilingual Reply Suggestion (MRS)

août 1, 2021

Data augmentation is proven to be effective in many NLU tasks, especially for those suffering from data scarcity. In this paper, we present a powerful and easy to deploy text augmentation framework, Data Boost, which augments data through reinforcement learning guided conditional generation. We evaluate Data Boost on three diverse text classification tasks under five different classifier architectures. The result shows that Data Boost can boost the performance of classifiers especially in low-resource data scenarios. For instance, Data Boost improves F1 for the three tasks by 8.7% on average when given only 10% of the whole data for training. We also compare Data Boost with six prior text augmentation methods. Through human evaluations (N=178), we confirm that Data Boost augmentation has comparable quality as the original data with respect to readability and class consistency.

Télécharger Les détails