DYLE: Dynamic Latent Extraction for Abstractive Long-Input Summarization
- Ziming Mao ,
- Chen Henry Wu ,
- Ansong Ni ,
- Yushen Zhang ,
- Rui Zhang ,
- Tao Yu ,
- Budhaditya Deb ,
- Chenguang Zhu ,
- Ahmed Awadallah ,
- Dragomir Radev
ACL 2022 |
Transformer-based models have achieved state-of-the-art performance on short text summarization. However, they still struggle with long-input summarization. In this paper, we present a new approach for long-input summarization: Dynamic Latent Extraction for Abstractive Summarization. We jointly train an extractor with an abstractor and treat the extracted text snippets as the latent variable. We propose extractive oracles to provide the extractor with a strong learning signal. We introduce consistency loss, which encourages the extractor to approximate the averaged dynamic weights predicted by the generator. We conduct extensive tests on two long-input summarization datasets, GovReport (document) and QMSum (dialogue). Our model significantly outperforms the current state-of-the-art, including a 6.21 ROUGE-2 improvement on GovReport and a 2.13 ROUGE-1 improvement on QMSum. Further analysis shows that the dynamic weights make our generation process highly interpretable. Our code will be publicly available upon publication.