PRETZEL: opening the black box of machine learning prediction serving systems
- Yunseong Lee ,
- Alberto Scolari ,
- Byung-Gon Chun ,
- Marco Domenico Santambrogio ,
- Matteo Interlandi ,
- Markus Weimer
Operating Systems Design and Implementation |
Published by USENIX Association
Machine Learning models are often composed of pipelines of transformations. While this design allows to efficiently execute single model components at training-time, prediction serving has different requirements such as low latency, high throughput and graceful performance degradation under heavy load. Current prediction serving systems consider models as black boxes, whereby prediction-time-specific optimizations are ignored in favor of ease of deployment. In this paper, we present PRETZEL, a prediction serving system introducing a novel white box architecture enabling both end-to-end and multi-model optimizations. Using production-like model pipelines, our experiments show that PRETZEL is able to introduce performance improvements over different dimensions; compared to state-of-the-art approaches PRETZEL is on average able to reduce 99th percentile latency by 5.5× while reducing memory footprint by 25×, and increasing throughput by 4.7×.