An Encoder-Decoder Approach to Handwritten Mathematical Expression Recognition with Multi-head Attention and Stacked Decoder

2021 International Conference on Document Analysis and Recognition |

Published by Springer, Cham

Publication

Encoder-decoder framework with attention mechanism has become a mainstream solution to handwritten mathematical expression recognition (HMER) since “watch, attend and parse (WAP)” approach was proposed in 2017, where a convolutional neural network is used as encoder and a gated recurrent unit with attention is used in decoder. Inspired by the recent success of Transformer in many applications, in this paper, we adopt the design of multi-head attention and stacked decoder in Transformer to improve the decoder part of the WAP framework for HMER. Experimental results on CROHME tasks show that multi-head attention can boost the expression recognition rate (ExpRate) of WAP from 54.32%/58.05% to 56.76%/59.72% and stacked decoder can further improve ExpRate to 57.72%/61.38% on CROHME 2016/2019 test sets.