Compressing CNN-DBLSTM models for OCR with teacher-student learning and Tucker decomposition
Abstract Integrated convolutional neural network (CNN) and deep bidirectional long short-term memory (DBLSTM) based character models have achieved excellent recognition accuracies on optical character recognition (OCR) tasks, along with large amount of model parameters and massive computation cost. To deploy CNN-DBLSTM model in products with CPU server, there is an urgent need to compress and accelerate it as much as possible, especially the CNN part, which dominates both parameters and computation. In this paper, we study teacher-student learning and Tucker decomposition methods to reduce model size and runtime latency for CNN-DBLSTM based character model for OCR. We use teacher-student learning to transfer the knowledge of a large-size teacher model to a small-size compact student model, followed by Tucker decomposition to further compress the student model. For teacher-student learning, we design a novel learning criterion to bring in the guidance of succeeding LSTM layer when matching the CNN-extracted feature sequences of the large teacher and small student models. Experimental results on large scale handwritten and printed OCR tasks show that, using teacher-student learning alone achieves 9.90 × footprint reduction and 15.23 × inference speedup yet without degrading recognition accuracy. Combined with Tucker decomposition method, we can compress and accelerate the model further. The decomposed model achieves 11.89 × footprint reduction and 22.16 × inference speedup while suffering no or only a small recognition accuracy degradation against the large-size baseline model.