SparTA: Deep-Learning Model Sparsity via Tensor-with-Sparsity-Attribute
- Ningxin Zheng ,
- Bin Lin ,
- Quanlu Zhang ,
- Lingxiao Ma ,
- Yuqing Yang ,
- Fan Yang ,
- Yang Wang ,
- Mao Yang ,
- Lidong Zhou
Sparsity is becoming arguably the most critical dimension to explore for efficiency and scalability, as deep learning models grow significantly larger and more complex. After all, the biological neural networks, where deep learning draws inspirations, are naturally sparse and highly efficient.
We advocate an end-to-end approach to model sparsity via a new abstraction called Tensor-with-Sparsity-Attribute (TeSA), which augments the default Tensor abstraction that is fundamentally designed for dense models. TeSA enables the sparsity attributes and patterns (e.g., for pruning and quantization) to be specified, propagated forward and backward across the entire deep learning model, and used to create highly efficient, specialized operators, taking into account the execution efficiency of different sparsity patterns on different (sparsity-aware) hardware. The resulting SparTA framework can accommodate various sparsity patterns and optimization techniques, delivering 1.7x~8.4x average speedup on inference latency compared to seven state-of-the-art (sparse) solutions with smaller memory footprints. As an end-to-end model sparsity framework, SparTA facilitates sparsity algorithms to explore better sparse models.