Wiki2row – the In’s and Out’s or Row Suggestion with a Large Scale Knowledge Base
- Alperen Karaoglu ,
- Carina Negreanu ,
- Shuang Chen ,
- Jack Williams ,
- Dany Fabian ,
- Andy Gordon ,
- Chin-Yew Lin
Row suggestion, a generalization of set expansion, is the task of augmenting a given table of text and numbers with additional, relevant rows. A viable approach is to generate trustworthy suggestions by grounding candidates in a verifiable source and in our work we focus on knowledge bases, in particular on Wikidata. Our pipeline begins by linking existing rows and columns to entities and properties. The primary focus of this work is to improve candidate generation and ranking without requiring in-domain training or fine-tuning. Our novel contributions are to account for semantic information by using BigGraph embeddings and GPT-3 free text generation, and tabular information by differentiating between in-table properties (explicit in given columns) and out-of-table properties (implicit from the knowledge base). Measured on the WikiTables benchmark, our solution exceeds or achieves comparable performance to previous state-of-the-art systems that use small, carefully curated knowledge bases (such as DBpedia). We extend our algorithm to present the first approach to bias-aware row suggestion when table completion is not achievable, that is, when we cannot define a complete set of entities. We suggest quantitative measures to evaluate performance on this task.