007: Democratically Finding the Cause of Packet Drops
- Behnaz Arzani ,
- Selim Ciraci ,
- Luiz Chamon ,
- Yibo Zhu ,
- Hongqiang Liu ,
- Jitu Padhye ,
- Boon Thau Loo ,
- Geoff Outhred
NSDI '18 |
Network failures continue to plague datacenter operators as their symptoms may not have direct correlation with where or why they occur. We introduce 007, a lightweight, always-on diagnosis application that can find problematic links and also pinpoint problems for each TCP connection. 007 is completely contained within the end host. During its two month deployment in a tier-1 datacenter, it detected every problem found by previously deployed monitoring tools while also finding the sources of other problems previously undetected.