MetaInsight: Automatic Discovery of Structured Knowledge for Exploratory Data Analysis
- Pingchuan Ma ,
- Rui Ding ,
- Shi Han ,
- Dongmei Zhang
ACM SIGMOD International Conference on Management of Data |
Organized by Association for Computing Machinery
Automatic Exploratory Data Analysis (EDA) focuses on automatically discovering pieces of knowledge in the form of interesting data patterns. However, the conveyed knowledge by these suggested data patterns are disjointed or lack organization. Therefore, it is difficult for users to gain structured knowledge, and as the number of suggested patterns grows, these stand-alone patterns are less likely to motive users to conduct follow-up analysis, which hinders it from being effectively utilized to facilitate EDA. In this paper, we propose MetaInsight, a structured representation of knowledge extracted from multi-dimensional data aiming to facilitate EDA automatically and effectively. Specifically, we propose a novel formulation of basic data pattern to capture essential characteristics of raw data distribution to achieve knowledge extraction. Then based on the mined Homogeneous Data Patterns (HDP) and inter-pattern similarity, MetaInsight is identified by categorizing basic data patterns (within an HDP) into commonness(es) and exceptions thus achieving structured knowledge representation. The commonness(es) and exceptions concretize the knowledge obtained by induction and validation processes which are two typical analysis mechanisms conducted in EDA. We propose a novel scoring function to quantify the usefulness of MetaInsight, an effective and efficient mining procedure and a ranking algorithm to automatically discover high-quality MetaInsights from multi-dimensional data. We demonstrate the effectiveness and efficiency of MetaInsights (w.r.t. facilitating EDA) through evaluation on real-world datasets and user studies on both expert users and non-expert users.