Over the last decade, there has been a tremendous growth in data-intensive applications and services in the cloud. Data is created on a variety of edge sources, e.g., devices, browsers, and servers, and processed by cloud applications to gain insights or take decisions. Applications and services either work on collected data, or monitor and process data in real time. These applications are typically update intensive and involve a large amount of state beyond what can fit in main memory. However, they display significant temporal locality in their access pattern.
The FASTER project proposes a new design for storage systems, based on a tiered record-oriented storage organization called the hybrid log. Data in the hybrid log is accessed through a thread-scalable latch-free mechanism based on a new epoch protection framework. Data in the hybrid log moves across tiers in a self-organizing manner, while providing fast in-place-update capability to hot data in main memory. Based on this record storage design, the project offers two basic library primitives:
- FasterKV is a key-value store for point operations. It combines a highly cache-optimized concurrent hash index with a novel self-tuning data organization. It extends the standard key-value store interface to handle read-modify-writes and blind update operations. FASTER achieves orders-of-magnitude better throughput – up to 160M operations per second on a single machine – than alternative systems deployed widely today. It also exceeds the performance of pure in-memory data structures when the working set fits in memory.
- FasterLog is a thread-scalable log abstraction for writing, committing, and iterating over vast amounts of data on tiered storage. It can be used for scenarios such as write-ahead logging in databases, pub/sub, and cross-node reliable communication.
FASTER is open source (we have a C# version, and a C++ port of the KV). Find FASTER on GitHub (opens in new tab). The original FASTER research paper (opens in new tab) appeared at the SIGMOD 2018 conference. Following this work, we have made seminal research contributions in the areas of recovery, distributed processing, fast ingestion and querying of semi-structured log data, and serverless resilient workflows. You can see a full list of publications related to the project at this link (opens in new tab).
Personne
Badrish Chandramouli
Partner Research Manager
Donald Kossmann
Distinguished Scientist and Director of Redmond Lab
Ted Hart
Senior RSDE
Microsoft Research