A New Approach to Deep-Learning Model Sparsity via Tensor-with-Sparsity-Attribute
- Ningxin Zheng ,
- Bin Lin ,
- Quanlu Zhang ,
- Lingxiao Ma ,
- Yuqing Yang ,
- Fan Yang ,
- Mao Yang ,
- Lidong Zhou
MSR-TR-2021-20 |
Published by Microsoft
Sparsity is becoming arguably the most critical dimension to explore for efficiency and scalability, as deep learning models grow significantly larger and more complex. After all, the biological neural networks, where deep learning draws inspirations, are naturally sparse and highly efficient.
We advocate a new approach to model sparsity via a new abstraction called Tensor-with-Sparsity-Attribute (or TeSA) to augment the default Tensor abstraction, which is fundamentally designed for dense models. TeSA enables the sparsity attributes and patterns (e.g., for pruning and quantization) to be specified, propagated forward and backward across stages, and used to create highly efficient, specialized operators, taking into account any special sparsity support from the underlying hardware.
The resulting SparGen framework is flexible in accommodating more than 10 popular sparsity schemes, is efficient in delivering more than 8x speedup compared to existing solutions like TVM and cuSPARSE, and is extensible to incorporate new innovations through new sparsity attributes, new propagation rules, new optimized sparse operators, or new sparsity-aware accelerators.