Project Silica: Sustainable cloud archival storage in glass

已发布

作者 , Project Silica Research Director , Senior Researcher , Principal Researcher , Principal Research Software Engineer , Distinguished Engineer / Deputy Director , Principal Research Manager

This research paper was presented at the 29th ACM Symposium on Operating Systems Principles (opens in new tab) (SOSP 2023), the premier forum for the theory and practice of computer systems software.

SOSP 2023
Project Silica: Towards Sustainable Cloud Archival Storage in Glass

Data growth demands a sustainable archival solution

For millennia, data has woven itself into every facet of our lives, from business and academia to personal spheres. Our production of data is staggering, encompassing personal photos, medical records, financial data, scientific insights, and more. By 2025, it’s estimated that we will generate a massive 175 zettabytes of data annually. Amidst this deluge, a substantial portion is vital for preserving our collective heritage and personal histories.  

Presently, magnetic technologies like tape and hard disk drives provide the most economical storage, but they come with limitations. Magnetic media lacks the longevity and durability essential for enduring archival storage, requiring data to be periodically migrated to new media—for hard disk drives, this is every five years, for magnetic tape, it’s around ten. Moreover, ensuring data longevity on magnetic media requires regular “scrubbing,” a process involving reading data to identify corruption and fixing any errors. This leads to substantial energy consumption. We need a sustainable solution, one that ensures the preservation of our digital heritage without imposing an ongoing environmental and financial burden.

Project Silica: Sustainable and durable cloud archival storage

Our paper, “Project Silica: Towards Sustainable Cloud Archival Storage in Glass, (opens in new tab)” presented at SOSP 2023 (opens in new tab), describes Project Silica, a cloud-based storage system underpinned by quartz glass. This type of glass is a durable, chemically inert, and resilient low-cost media, impervious to electromagnetic interference. With data’s lifespan lasting thousands of years, quartz glass is ideal for archival storage, offering a sustainable solution and eliminating the need for periodic data refreshes.

Writing, reading, and decoding data

Ultrafast femtosecond lasers enable the writing process. Data is written inside a square glass platter similar in size to a DVD through voxels, permanent modifications to the physical structure of the glass made using femtosecond-scale laser pulses. Voxels encode multiple bits of data and are written in 2D layers across the XY plane. Hundreds of these layers are then stacked in the Z axis. To achieve high write throughput, we rapidly scan the laser pulses across the length of the media using a scanner similar to that used in barcode readers. 

To read data, we employ polarization microscopy to image the platter. The read drive scans sectors in a single swift Z-pattern, and the resulting images are processed for decoding. Different read drive options offer varying throughput, balancing cost and performance.

Data decoding relies on ML models that analyze images captured by the read drive, accurately converting signals from analog to digital. The glass library design includes independent read, write, and storage racks. Platters are stored in power-free storage racks and moved by free-roaming shuttles, ensuring minimal resource consumption for passive storage, as shown in Video 1. A one-way system between write racks and the rest of the library ensures that a written platter cannot be over-written under any circumstances, enforcing data integrity.

Video 1. The Silica library prototype demonstrates the flexible and scalable design of the system and its ability to sustainably service archival workloads. 

Azure workload analysis informs Silica’s design

To build an optimal storage system around the core Silica technology, we extensively studied cloud archival data workloads from Azure Storage. Surprisingly, we discovered that small read requests dominate the read workload, yet a small percentage of requests constitute the majority of read bytes, creating a skewed distribution, as illustrated in Figure 1.

Project Silica paper at SOSP 2023: A double bar chart with 2 y-axes: percentage of total read operations on the left y-axis, and percentage of total bytes read on the right y-axis; with file size buckets on the x-axis. The graph shows that the majority of read operations are for files with small file sizes, but they only make up a small fraction of all the bytes read (i.e., 58% of operations are for file sizes smaller than 4MB, but make up less than 1.2% of all bytes read). Conversely, most bytes read are for large files, but make up a small fraction of all read operations (i.e., 85% of bytes read are for files larger than 256MB, but make up less than 2% of requests).
Figure 1. The distribution of read request sizes. Most requests are for small files, but they make up a small percentage of the total load in bytes.

This implies that minimizing the latency of mechanical movement in the library is crucial for optimal performance. Silica glass, a random-seeking storage medium, can suitably meet these requirements as it eliminates the necessity for spooling, unlike magnetic tape. Figure 2 illustrates substantial differences in read demand across various datacenters. These results suggest that we need a flexible library design that can scale resources for each datacenter’s workload. Studying these archival workloads has been instrumental in helping us establish the core design principles for the Silica storage system.

Project Silica paper at SOSP 2023: Figure 2. A bar chart showing different, unlabeled data centers on the x-axis, and tail over median read throughput on the y-axis on a log scale. The graph shows up to 7 orders of magnitude mean-to-tail difference within a data center, and up to 5 orders of magnitude variability in the mean-to-tail difference across different data centers.
Figure 2. Tail over median read load for different datacenters. The data shows significant variation across and within datacenters.

Spotlight: AI-POWERED EXPERIENCE

Microsoft research copilot experience

Discover more about research at Microsoft through our AI-powered experience

Project Silica’s versatile storage system

We designed and evaluated a comprehensive storage system that manages error correction, data layout, request scheduling, and shuttle traffic management. Our design effectively manages IOPS-intensive tasks, meeting the expected service level objective (SLO) of an archival storage tier, approximately 15 hours. Interestingly, even in volume-intensive scenarios where a large number of bytes are read, our system efficiently handles requests using read drives with low throughput. In both cases, throughput demands are significantly below those of traditional tape drives. This is shown in Figure 3. The paper provides an extensive description of this system, and the video above shows our prototype library’s capabilities. 

Project Silica paper at SOSP 2023: Figure 3. A line chart with 3 lines: Volume, IOPS, and Typical. The x-axis shows Read drive throughput ranging from 30MB/s to 210MB/s in increments of 30, and the y-axis shows the tail completion time in hours of the system running each of the workloads represented by each line. The graph shows that all workloads complete within the desired 15-hour SLO, even with 30MB/s read drives. The SLO improves as read drive throughput increases, but starts to plateau past 60MB/s for all workloads.
Figure 3. Volume and IOPS workloads represent different extremes in the spectrum of read workloads. Our design can service both workloads well within the expected SLO for an archival storage tier, at about 15 hours.

Diverse applications for sustainably archiving humanity’s data

Project Silica holds promise in numerous sectors, such as healthcare, scientific research, and finance, where secure and durable archival storage of sensitive data is crucial. Research institutions could benefit from Silica’s ability to store vast datasets generated from experiments and simulations, ensuring the integrity and accessibility of research findings over time. Similarly, healthcare organizations could securely archive patient records, medical imaging data, and research outcomes for long-term reference and analysis. 

As the volume of globally generated data grows, traditional storage solutions will continue to face challenges in terms of scalability, energy-efficiency, and long-term durability. Moreover, as technologies like AI and advanced analytics progress, the need for reliable and accessible archival data will continue to intensify. Project Silica is well-positioned to play a pivotal role in supporting these technologies by providing a stable, secure, and sustainable repository for the vast amounts of data we create and rely on.

相关论文与出版物

继续阅读

查看所有博客文章