Understanding the Micro-Behaviors of Hardware Offloaded Network Stacks with Lumina
- Zhuolong Yu ,
- Bowen Su ,
- Wei Bai ,
- Shachar Raindel ,
- Vladimir Braverman ,
- Xin Jin
Organized by ACM
Hardware offloaded network stacks are widely adopted in modern datacenters to meet the demand for high throughput, ultra-low latency and low CPU overhead. To fully leverage their exceptional performance, users need to have a deep understanding of their behaviors. Despite many efforts on testing software network stacks, hardware network stacks impose unique challenges to testing tools due to their kernel bypass nature and high performance.
In this paper, we present Lumina, a tool to test the correctness and performance of hardware network stacks. Lumina leverages network programmability to emulate various network scenarios at line rate. With user-friendly interfaces, Lumina enables developers to inject deterministic events, thus facilitating the development of precise and reproducible tests. Given the limited resource and flexibility of programmable network devices, we mirror all the packets to dedicated servers and dump them for offline analysis. We leverage Lumina to test four RDMA NICs from NVIDIA and Intel, and identify bugs that can significantly degrade performance or mislead network operations. Lumina also enables us to capture unexpected micro-behaviors which are missing or not clearly described in public documents and specifications. Vendors have confirmed the critical bugs we discovered and will include bug fixes in future releases.