The growth of large-scale networked services has brought to the fore myriad challenges: performance, reliability, efficiency, cost, and more. Traditionally, work on addressing and balancing these has been done in silos. For instance, an application could make choices to optimize its own performance or cost, while treating the rest of the workload and infrastructure as outside its purview. However, such an approach breaks down when an application or service is so large that it defines the infrastructure.
The Network Brain project focuses on the holistic optimization of large-scale networked services. Our approach on a logically centralized «network brain», which pools together information from myriad sources to make informed decisions. While being akin to the controller in software-defined networking, the brain has a broader ambit. It not only has knowledge of the network, but it also has signals from applications and even users. Such an approach is particularly apt in the first-party setting, where both the application/service and the underlying infrastructure are operated by the same entity. This opens up opportunities for optimization that cut across the layers.
The figure below depicts the overall architecture of the Network Brain, showing how signals from across the layers are pooled together to drive holistic optimization.
We have employed such holistic optimization in a number of contexts. For example, by combining information on the time-to-recovery (TTR) of network links with signals from the application on the deferability of the workload, we have been able to achieve significant savings in backup capacity during WAN capacity planning (NSDI 2022). As another example, by combining information on user participation in online conferencing with signals on the utilization of compute and network resources across a geo-distributed infrastructure, we can map calls to servers in a manner that yields good performance while minimizing cost (SIGCOMM 2023). To inform such holistic optimization, we have also done work on understanding the impact of network performance on application users (IMC 2021 and HotNets 2023).