Toward ML-Centric Cloud Platforms: Opportunities, Designs, and Experience with Microsoft Azure
- Ricardo Bianchini ,
- Marcus Fontoura ,
- Eli Cortez ,
- Anand Bonde ,
- Alexandre Muzio ,
- Ana-Maria Constantin ,
- Thomas Moscibroda ,
- Gabriel Magalhaes ,
- Girish Bablani ,
- Mark Russinovich
Communications of the ACM | , Vol 63(2)
Services that rely heavily on machine learning (ML), such as speech understanding and image recognition, have been receiving significant attention. ML can also be used to optimize the cloud platforms underlying these services. The challenge is defining exactly how we should integrate ML into these platforms. In this paper, we motivate this integration, discuss multiple dimensions associated with it and the architectural tradeoffs in each dimension, and describe our first approach to building ML-centric cloud platforms. Specifically, as one point in the multi-dimensional space, we overview the Resource Central ML and prediction-serving system, its integration into the Microsoft Azure Compute fabric, and the lessons we learned. We conclude with a discussion of future research avenues.