FastSeq: Make Sequence Generation Faster

  • ,
  • Fei Hu ,
  • Jiusheng Chen ,
  • Nikhil Bhendawade ,
  • Ting Ye ,
  • ,
  • Nan Duan ,
  • Desheng Cui ,
  • Bingyu Chi ,
  • Ruofei Zhang

2021 Meeting of the Association for Computational Linguistics |

Transformer-based models have made tremendous impacts in natural language generation. However, the inference speed is a bottleneck due to large model size and intensive computing involved in auto-regressive decoding process. We develop FastSeq framework to accelerate sequence generation without accuracy loss. The proposed optimization techniques include an attention cache optimization, an efficient algorithm for detecting repeated n-grams, and an asynchronous generation pipeline with parallel I/O. These optimizations are general enough to be applicable to Transformer-based models (e.g., T5, GPT2, and UniLM). Our benchmark results on a set of widely used and diverse models demonstrate 4-9x inference speed gain. Additionally, FastSeq is easy to use with a simple one-line code change. The source code is available at https://github.com/microsoft/fastseq (opens in new tab).

Téléchargements de publications

FastSeq

décembre 14, 2021

FastSeq provides efficient implementation of popular sequence models (e.g. Bart, ProphetNet) for text generation, summarization, translation tasks etc. It automatically optimizes inference speed based on popular NLP toolkits (e.g. FairSeq and HuggingFace-Transformers) without accuracy loss.