Concept Expansion Using Web Tables
- Chi Wang ,
- Kaushik Chakrabarti ,
- Yeye He ,
- Kris Ganjam ,
- Zhimin Chen ,
- Phil Bernstein
Proceeding of 2015 International World Wide Web Conference |
We study the following problem: Given the name of an ad-hoc concept as well as a few seed entities belonging to the concept, output all entities belonging to it. Since producing the exact set of entities is hard, we focus on returning a ranked list of entities belonging to the concept. Previous approaches either use seed entities as the only input, or inherently require negative examples. They suffer from input ambiguity and semantic drift, or are not viable options for ad-hoc tail concepts. In this paper, we propose to leverage the millions of tables on the web for this problem. The core technical challenge is to identify the “exclusive” tables for a concept to prevent semantic drift; existing holistic ranking techniques like personalized PageRank are inadequate for this purpose. We develop novel probabilistic ranking methods based on a new type of tableentity relationship. Experiments with real-life concepts show that our proposed solution is significantly more effective than applying state-of-the-art set expansion or holistic ranking techniques.