Accelerate CNNs from Three Dimensions: A Comprehensive Pruning Framework
- Wenxiao Wang ,
- Minghao Chen ,
- Shuai Zhao ,
- Long Chen ,
- Jinming Hu ,
- Hai Liu ,
- Deng Cai ,
- Xiaofei He ,
- Wei Liu
ICML 2021 |
Neural network pruning is one of the most popular methods for model acceleration. Most pruning methods, such as filter-level or layer-level pruning, prune the model along one single dimension (depth, width, or resolution) to meet a computational cost requirement. However, such pruning policy often leads to excessive reduction of that dimension, thus inducing a huge accuracy loss. To alleviate this issue, we argue that pruning should be done along three dimensions comprehensively. For this purpose, our pruning framework formulates pruning as an optimization problem. Specifically, it first fits the relations between the model’s accuracy and depth/width/resolution via polynomial regression and then maximizes the polynomial to acquire optimal values for three dimensions. Finally, the model is pruned along three dimensions accordingly. In this framework, since collecting too much data used for the regression is very time-costly, we propose two approaches to lower the cost: (1) specializing the polynomial to ensure an accurate regression even with less data; (2) employing iterative pruning and fine-tuning to collect data faster. Extensive experiments show that our algorithm outperforms state-of-the-art pruning and even NAS-based algorithms.