关于
Data analysis is crucial for uncovering insights and making informed decisions in both business and everyday life. In order to empower users and organizations with techniques to automatically discover insights in their data, I have been conducting research on insights at the Data, Knowledge, and Intelligence (opens in new tab) (a.k.a., DKI) group of Microsoft Research Asia (now DKI at E+D).
My research focuses on two important topics. One is how to formulate the insight concept into computable data entity, which is the fundamental problem of insight discovery. The other topic is about interpretability and causality which are also essential to make insights explainable and reliable. The insight-related research has also resulted in a series of tech-transfers as AI for analysis features in Microsoft products, including Power BI Quick Insights (opens in new tab), Excel Analyze Data (opens in new tab), and Forms Insights (opens in new tab).
Selective Publications:
Insights
- [SIGMOD’19] QuickInsights: Quick and Automatic Discovery of Insights from Multi-Dimensional Data (opens in new tab) first proposes a general formulation of insight along with a systematic insight-mining framework.
- [SIGMOD’17] [SIGMOD’21] Extracting Top-K Insights from Multi-dimensional Data (opens in new tab) and MetaInsight: Automatic Discovery of Structured Knowledge for Exploratory Data Analysis (opens in new tab) further extend the framework from the perspectives of compound calculus and compound knowledge, respectively.
- [SIGMOD’23] XInsight: eXplainable Data Analysis Through the Lens of Causality (opens in new tab), which provides data analysis with qualitative and quantitative explanations of causal and non-causal semantics to facilitate more accurate data interpretation and decision-making in real-world scenarios.
- [EMNLP’23 Demo] InsightPilot: An LLM-Empowered Automated Data Exploration System (opens in new tab), which is an LLM-based, automated data exploration system designed to simplify the data exploration process.
Causality
- Supervised Causal Learning [KDD’22][SDM’23]: Regarding causal discovery (a.k.a., causal structure learning), ML4C: Seeing Causality Through Latent Vicinity (opens in new tab) classifies edge orientation on the skeleton of an input causal graph and explores the benefits of supervision in causal discovery. This work is followed by ML4S: Learning Causal Skeleton from Vicinal Graphs (opens in new tab) that discusses how to learn a causal skeleton in a self-supervised learning setting.
- Skeleton Learning as the Foundation [AAAI’20][KDD’22][KDD’24]: Skeleton represents the persistent dependency between a pair of variables. Persistent dependency is a general property that should be evaluated prior to conducting causal orientation. Reliable and Efficient Anytime Skeleton Learning (opens in new tab) advocates the importance of setting skeleton learning as a standalone task; ML4S: Learning Causal Skeleton from Vicinal Graphs (opens in new tab) is a first attempt to adopt supervised learning to predict causal skeleton, and it proposes self-augmentation to acquire in-domain training data; Scalable Differentiable Causal Discovery in the Presence of Latent Confounders with Skeleton Posterior (opens in new tab) verifies that the learned skeleton can consistently benefit (vs. without using skeleton) a wide range of causal orientation tasks.
XAI
- [KDD’22][ESWA’24] pureGAM: Learning an Inherently Pure Additive Model (opens in new tab) extends Generalized Additive Models (GAMs) to model higher-order interactions across variables. It is the first time for the model to be identifiable and interpretable when modeling both categorical and numerical variables with their interactions. FXAM: A Unified and Fast Interpretable Model for Predictive Analytics (opens in new tab) extends GAM’s modeling capability with a unified additive model for numerical, categorical, and temporal features to address the one-to-many and many-to-one phenomena which appear commonly in real-world scenarios.
Project Media Links:
- Ideas in Excel – Office Support (microsoft.com) (opens in new tab)
- AI赋能版Excel: 庞大数据,一键分析 (qq.com) (opens in new tab)
- 科学匠人丨在数据智能领域,做脚踏实地、仰望星空的研究 (qq.com) (opens in new tab)
- Quick Insights | Microsoft Power BI Blog | Microsoft Power BI (opens in new tab)
- Generate data insights on your dataset automatically – Power BI | Microsoft Docs (opens in new tab)
- Welcome to Forms Ideas – Analyze Your Response Data Smartly in Forms (opens in new tab)