OCR and Document Understanding

We have been developing SOTA technologies and industry-leading product solutions for following scenarios: (1) Universal OCR to detect and recognize any text in image/PDF; (2) Universal math OCR to detect and recognize any math expression in image/PDF; (3) Universal table understanding to detect, recognize, and understand any tables in image/PDF; (4) Universal layout analysis to detect page objects such as text blocks, lists, tables, math equations, figures, etc. in any image/PDF, identify their relationships, and determine the reading order of body text; (5) Universal information extraction to extract entities, key/value pairs, item lists and other intended information from any image/PDF document; (6) Synthetic data generation for the above scenarios to reduce cost, improve accuracy, and increase the speed of innovation.

Personne

Qiang Huo

Partner Research Manager