AutoSys: The Design and Operation of Learning-Augmented Systems
- Chieh-Jan Mike Liang ,
- Hui Xue ,
- Mao Yang ,
- Lidong Zhou ,
- Lifei Zhu ,
- Zhao Lucis Li ,
- Zibo Wang ,
- Qi Chen ,
- Quanlu Zhang ,
- Chuanjie Liu ,
- Wenjun Dai
ATC (USENIX Annual Technical Conference) |
Published by USENIX | Organized by USENIX
Although machine learning (ML) and deep learning (DL) provide new possibilities into optimizing system design and performance, taking advantage of this paradigm shift requires more than implementing existing ML/DL algorithms. This paper reports our years of experience in designing and operating several production learning-augmented systems at Microsoft. AutoSys is a framework that unifies the development process, and it addresses common design considerations including ad-hoc and nondeterministic jobs, learning-induced system failures, and programming extensibility. Furthermore, this paper demonstrates the benefits of adopting AutoSys with measurements from one production system, Web Search. Finally, we share long-term lessons stemmed from unforeseen implications that have surfaced over the years of operating learning-augmented systems.