LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation

Yixiao Li; Yifan Yu; Qingru Zhang; Chen Liang; Pengcheng He; Weizhu Chen; Tuo Zhao

LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation

Yixiao Li ,
Yifan Yu ,
Qingru Zhang ,
Chen Liang ,
Pengcheng He ,
Weizhu Chen ,
Tuo Zhao

ICLR 2023 | May 2023

下载 BibTex

Transformer models have achieved remarkable results in various natural language tasks, but they are often prohibitively large, requiring massive memories and computational resources. To re- duce the size and complexity of these models, we propose LoSparse (Low-Rank and Sparse ap- proximation), a novel model compression technique that approximates a weight matrix by the sum of a low-rank matrix and a sparse matrix. Our method combines the advantages of both low- rank approximations and pruning, while avoiding their limitations. Low-rank approximation compresses the coherent and expressive parts in neurons, while pruning removes the incoherent and non-expressive parts in neurons. Pruning enhances the diversity of low-rank approximations, and low-rank approximation prevents pruning from losing too many expressive neurons. We evaluate our method on natural language under- standing, question answering, and natural language generation tasks. We show that it significantly outperforms existing compression methods.

GitHub