PolySem: Efficient Polyglot Analytics on Semantic Data
- Xinyu Liu ,
- Venkatesh Emani ,
- Avrilia Floratou ,
- Joyce Cahoon ,
- Philip Seamark ,
- Carlo Curino
Poly'23: Polystore systems for heterogeneous data in multiple databases with privacy and security assurances |
Data scientists and data engineers spend a significantly large portion of their time trying to understand, clean and transform their data before they can even start performing any meaningful analysis. Most database vendors provide business intelligence (BI) tools as an efficient and user friendly platform for customers to perform their data cleaning, preparation and linking tasks to obtain actionable semantic data. However, customers are increasingly interested in querying semantic data through various query modalities including SQL, imperative programming languages such as Python, and natural language queries. Currently, customers are limited to using either the visual interfaces provided by these tools or languages that are specific to the particular tool. In this proposal, we describe techniques to enable the execution of user queries expressed in different modalities on semantic datasets without having to export data out of the BI system. Our techniques comprise of automatic translation of user queries into a language-agnostic representation of data processing operations, and subsequently to the specific query language that is amenable to execution on the BI engine. Our evaluation results on business intelligence and decision support benchmarks suggest significant improvements in query performance compared to other popular data processing engines.