Multitenancy in Autopilot

Établi : June 2, 2014

We leverage spare capacity in Bing to run batch workloads (i.e., data analytics). From this project we started multiple research contributions:

This project originally aimed to harvest idle resources in large scale datacenters. We leverage the historical patterns of primary tenants to harvest both compute and storage from latency sensitive services. The research details are described in our paper (opens in new tab) at OSDI 2016 (opens in new tab).
We needed to scale HDFS to tens of thousands of servers. The research details are described in our paper (opens in new tab) at USENIX ATC 2017 (opens in new tab). We build HDFS Router-based Federation (opens in new tab) which is contributed back to Apache Hadoop (opens in new tab).
The large heterogeneous environment was very prone to long tail latencies. Our proposal to manage tail latency when accessing data in HDFS is described in our paper (opens in new tab) at EuroSys 2019 (opens in new tab).

Personne

Technical Fellow, Corporate Vice President, Microsoft Azure

Senior Principal Researcher

Principal Research SDE