Achieving Microsecond-Scale Tail Latency Efficiently with Approximate Optimal Scheduling
Datacenter applications expect microsecond-scale service times and tightly bound tail latency, with future workloads expected to be even more demanding. To address this challenge, state-of-the-art runtimes employ theoretically optimal scheduling policies, namely a single request queue and strict preemption.
We present Concord, a runtime that demonstrates how forgoing this design—-while still closely approximating it—–enables a significant improvement in application throughput while maintaining tight tail-latency SLOs. We evaluate Concord on microbenchmarks and Google’s LevelDB key-value store; compared to the state of the art, Concord improves application throughput by up to 52% on microbenchmarks and by up to 83% on LevelDB, while meeting the same tail-latency SLOs. Unlike the state of the art, Concord is application agnostic and does not rely on the nonstandard use of hardware, which makes it immediately deployable in the public cloud. Concord is publicly available at https://dslab.epfl.ch/research/concord (opens in new tab).