Validating datacenters at scale

  • ,
  • ,
  • ,
  • Amar Agrawal ,
  • Ashish Bhargava ,
  • Paul-Andre C. Bissonnette ,
  • Shane Foster ,
  • Andrew Helwer ,
  • M. Kasten ,
  • I. Lee ,
  • Anup Namdhari ,
  • Haseeb Niaz ,
  • Aniruddha Parkhi ,
  • Hanukumar Pinnamraju ,
  • Adrian Power ,
  • N. Raje ,
  • Parag Sharma

SIGCOMM 2019 |

Publication

We describe our experiences using formal methods and automated theorem proving for network operation at scale. The experiences are based on developing and applying the SecGuru and RCDC (Reality Checker for Data Centers) tools in Azure. SecGuru has been used since 2013 and thus, is arguably a pioneering industrial deployment of network verification. SecGuru is used for validating ACLs and more recently RCDC checks forwarding tables at Azure scale. A central technical angle is that we use local contracts and local checks, that can be performed at scale in parallel, and without maintaining global snapshots, to validate global properties of datacenter networks. Specifications leverage declarative encodings of configurations and automated theorem proving for validation. We describe how intent is automatically derived from network architectures and verification is incorporated as prechecks for making changes, live monitoring, and for evolving legacy policies. We document how network verification, grounded in architectural constraints, can be integral to operating a reliable cloud at scale.