A Unified Framework for Recognizing Handwritten Chemical Expressions

Proceedings of the 10th International Conference on Document Analysis and Recognition (ICDAR 2009) |

Publication

Chemical expressions have more variant structures in 2-D space than that in math equations. In this paper we propose a unified framework for recognizing handwritten chemical expressions including both inorganic and organic expressions. A set of novel statistical algorithms is presented in two key components of this framework: symbol grouping and structure analysis. Non-symbol modeling and inter-group modeling are proposed to achieve better grouping result, and bond modeling is proposed to group the special bond symbols in the unified framework. A graph-based representation (CESG) is defined for representing generic chemical expressions, and the structure analysis problem is formulated as a search problem for CESG over a weighted direction graph. Experiments on a database of more than 35,000 expressions were conducted and results are presented.