GLGE: A New General Language Generation Evaluation Benchmark

Dayiheng Liu; Yu Yan; Yeyun Gong; Weizhen Qi; Hang Zhang; Jian Jiao; Wei Chen; Jie Fu; Linjun Shou; Ming Gong (YIMING); Pengcheng Wang; Jiusheng Chen; Daxin Jiang (姜大昕); Jiancheng Lv; Ruofei Zhang; Winnie Wu; Ming Zhou; Nan Duan

GLGE: A New General Language Generation Evaluation Benchmark

Dayiheng Liu ,
Yu Yan ,
Yeyun Gong ,
Weizhen Qi ,
Hang Zhang ,
Jian Jiao ,
Wei Chen ,
Jie Fu ,
Linjun Shou ,
Ming Gong (YIMING) ,
Pengcheng Wang ,
Jiusheng Chen ,
Daxin Jiang (姜大昕) ,
Jiancheng Lv ,
Ruofei Zhang ,
Winnie Wu ,
Ming Zhou ,
Nan Duan

ACL-IJCNLP 2021 | November 2020

下载 BibTex

Multi-task benchmarks such as GLUE and SuperGLUE have driven great progress of pretraining and transfer learning in Natural Language Processing (NLP). These benchmarks mostly focus on a range of Natural Language Understanding (NLU) tasks, without considering the Natural Language Generation (NLG) models. In this paper, we present the General Language Generation Evaluation (GLGE), a new multi-task benchmark for evaluating the generalization capabilities of NLG models across eight language generation tasks. For each task, we continue to design three subtasks in terms of task difficulty (GLGE-Easy, GLGE-Medium, and GLGE-Hard). This introduces 24 subtasks to comprehensively compare model performance. To encourage research on pretraining and transfer learning on NLG models, we make GLGE publicly available and build a leaderboard with strong baselines including MASS, BART, and ProphetNet (The source code and dataset will be publicly available at this https URL).

论文与出版物下载

GLGE

13 5 月, 2021

General Language Generation Evaluation (GLGE) benchmark is a new multi-task benchmark for evaluating the generalization capabilities of NLG across eight language generation tasks.

下载数据