BatchIt: Optimizing Message-Passing Allocators for Producer-Consumer Workloads: An Intellectual Abstract
- Nathaniel Filardo ,
- Matthew J. Parkinson
Proceedings of the 2024 ACM SIGPLAN International Symposium on Memory Management (ISMM'24) |
Published by ACM
Modern, high-performance memory allocators must scale
to a wide array of uses, including producer-consumer workloads.
In such workloads, objects are allocated by one thread
and deallocated by another, which we call remote deallocations.
These remote deallocations lead to contention on the
allocator’s synchronization mechanisms. Message-passing
allocators, such as mimalloc and snmalloc, use message
queues to communicate remote deallocations between threads.
These queues work well for producer-consumer workloads,
but there is room for optimization.
We propose and characterize BatchIt, a conceptually simple
optimization for such allocators: a per-slab cache of
remote deallocations that enables batching of objects destined
for the same slab. This optimization aims to exploit
naturally-arising locality of allocations, and it generalizes
across particular implementations; we have implementations
for both mimalloc and snmalloc. Multi-threaded, producer-consumer
benchmarks show improved performance from reduced
rates of atomic operations and cache misses in the underlying
allocator. Experimental results using the mimalloc-bench
suite and a custom message-passing workload show
that some producer-consumer workloads see over 20% performance
improvement even based on the high-performance
these allocators already provide.