Demonstration of Geyser: Provenance Extraction and Applications over Data Science Scripts

ACM SIGMOD |

As enterprises have started developing and deploying complicated data science workloads at scale, the need for mechanisms that enable enterprise-grade data science (e.g., compliance or auditing) has become more pronounced. In this paper, we present Geyser, an extensible provenance system for data science workloads that can be used as a foundation for enterprise-grade data science. Our system supports both static and dynamic provenance, over a wide range of data science scripts, driven by a knowledge base of data science APIs. We demonstrate the wide applicability of the system using various industrial applications: provenance extraction, model compliance, model linting, model versioning, and poisoning detection. A video of the demonstration is available at https://aka.ms/geyserdemo.