MinuteSort with Flat Datacenter Storage
- Johnson Apacible ,
- Rich Draves ,
- Jeremy Elson ,
- Jinliang Fan ,
- Owen Hofmann ,
- Jon Howell ,
- Ed Nightingale ,
- Reuben Olinsky ,
- Yutaka Suzue
MSR-TR-2012-126 |
We have built a new high-performance distributed blob storage system, called Flat Datacenter Storage (FDS). Our MinuteSort entry is a relatively simple Daytona-class sort application that uses FDS for storage. We also have an Indy mode that is identical to the Daytona version except that input sampling is disabled and a uniform key distribution is assumed.
- In Indy mode, FDS sorted 1,470 GB in 59.4 s.
- In Daytona mode, FDS sorted 1,401 GB in 59.0 s.
The sorts were accomplished using a heterogeneous cluster consisting of 256 computers and 1,033 disks, divided broadly into two classes: storage nodes and compute nodes. Notably, no compute node in our system uses local storage for data; we believe FDS is the first system with competitive sort performance that uses remote storage. Because files are all remote, our 1,470 GB runs actually transmitted 4.4 TB over the network in under a minute. No strong assumptions are made around key or record lengths; keys and records of other lengths can be handled with only a performance-neutral configuration change.