Table Pre-training: A Survey on Model Architectures, Pre-training Objectives, and Downstream Tasks

Haoyu Dong; Zhoujun Cheng; Xinyi He; Mengyu Zhou; Anda Zhou; Fan Zhou; Ao Liu; Shi Han; Dongmei Zhang

Table Pre-training: A Survey on Model Architectures, Pre-training Objectives, and Downstream Tasks

Haoyu Dong ,
Zhoujun Cheng ,
Xinyi He ,
Mengyu Zhou ,
Anda Zhou ,
Fan Zhou ,
Ao Liu ,
Shi Han ,
Dongmei Zhang

IJCAI'2022 SURVEY TRACK | July 2022

Download BibTex

Following the success of pre-training techniques in the natural language domain, a flurry of table pre-training frameworks have been proposed and have achieved new state-of-the-arts on various downstream tasks such as table question answering, table type recognition, column relation classification, table search, and formula prediction. Various model architectures have been explored to best leverage the characteristics of structured tables, especially specially designed attention mechanisms. Moreover, to fully exploit the supervision signals in unlabeled tables, diverse pre-training objectives have been designed and evaluated, for example, denoising cell values, predicting numerical relationships, and learning a neural SQL executor. This survey aims to provide a comprehensive review of model designs, pre-training objectives, and downstream tasks for table pre-training, and we further share our thoughts on existing challenges and future opportunities.