Improving Handwritten OCR with Augmented Text Line Images Synthesized from Online Handwriting Samples by Style-Conditioned GAN

2020 International Conference on Frontiers in Handwriting Recognition |

Published by IEEE

Publication

By leveraging large amounts of training data and deep learning technologies, performances of modern handwritten optical character recognition (OCR) systems have been greatly improved. However, collecting and labeling massive handwriting images are both time-consuming and expensive. In this paper, we propose to augment handwritten OCR training with online handwriting samples. To achieve this goal, we propose a style-conditioned generative adversarial network (SC-GAN) with a novel training data pair generation strategy. Then this network is used to transfer the styles of real handwriting images to skeleton images extracted from online handwriting samples to generate photo-realistic text line images. Experimental results on a large scale handwritten OCR task show that the recognition accuracy of our handwritten OCR system is improved by using the augmented synthetic training data.