Microsoft at ASPLOS 2024: Advancing hardware and software for high-scale, secure, and efficient modern applications

已发布 2024年4月29日

作者 Rodrigo Fonseca , Sr Principal Research Manager Madan Musuvathi , Partner Research Manager

分享这个页面

Modern computer systems and applications, with unprecedented scale, complexity, and security needs, require careful co-design and co-evolution of hardware and software. The ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (opens in new tab), is the main forum where researchers bridge the gap between architecture, programming languages, and operating systems to advance the state of the art.

ASPLOS 2024 is taking place in San Diego between April 27 and May 1, and Microsoft researchers and collaborators have a strong presence, with members of our team taking on key roles in organizing the event. This includes participation in the program and external review committees and leadership as the program co-chair.

We are pleased to share that eight papers from Microsoft researchers and their collaborators have been accepted to the conference, spanning a broad spectrum of topics. In the field of AI and deep learning, subjects include power and frequency management for GPUs and LLMs, the use of Process-in-Memory for deep learning, and instrumentation frameworks. Regarding infrastructure, topics include memory safety with CHERI, I/O prefetching in modern storage, and smart oversubscription of burstable virtual machines. This post highlights some of this work.

Paper highlights

Characterizing Power Management Opportunities for LLMs in the Cloud

The rising popularity of LLMs and generative AI has led to an unprecedented demand for GPUs. However, the availability of power is a key limiting factor in expanding a GPU fleet. This paper characterizes the power usage in LLM clusters, examines the power consumption patterns across multiple LLMs, and identifies the differences between inference and training power consumption patterns. This investigation reveals that the average and peak power consumption in inference clusters is not very high, and that there is substantial headroom for power oversubscription. Consequently, the authors propose POLCA: a framework for power oversubscription that is robust, reliable, and readily deployable for GPU clusters. It can deploy 30% more servers in the same GPU clusters for inference tasks, with minimal performance degradation.

PIM-DL: Expanding the Applicability of Commodity DRAM-PIMs for Deep Learning via Algorithm-System Co-Optimization

PIM-DL is the first deep learning framework specifically designed for off-the-shelf processing-in-memory (PIM) systems, capable of offloading most computations in neural networks. Its goal is to surmount the computational limitations of PIM hardware by replacing traditional compute-heavy matrix multiplication operations with Lookup Tables (LUTs). PIM-DL first enables neural networks to operate efficiently on PIM architectures, significantly reducing the need for complex arithmetic operations. PIM-DL demonstrates significant speed improvements, achieving up to ~37x faster performance than traditional GEMM-based systems and showing competitive speedups against CPUs and GPUs.

Cornucopia Reloaded: Load Barriers for CHERI Heap Temporal Safety

Memory safety bugs have persistently plagued software for over 50 years and underpin some 70% of common vulnerabilities and exposures (CVEs) every year. The CHERI capability architecture (opens in new tab) is an emerging technology (opens in new tab) (especially through Arm’s Morello (opens in new tab) and Microsoft’s CHERIoT (opens in new tab) platforms) for spatial memory safety and software compartmentalization. In this paper, the authors demonstrate the viability of object-granularity heap temporal safety built atop CHERI with considerably lower overheads than prior work.

AUDIBLE: A Convolution-Based Resource Allocator for Oversubscribing Burstable Virtual Machines

Burstable virtual machines (BVMs) are a type of virtual machine in the cloud that allows temporary increases in resource allocation. This paper shows how to oversubscribe BVMs. It first studies the characteristics of BVMs on Microsoft Azure and explains why traditional approaches based on using a fixed oversubscription ratio or based on the Central Limit Theorem do not work well for BVMs: they lead to either low utilization or high server capacity violation rates. Based on the lessons learned from the workload study, the authors developed a new approach, called AUDIBLE, using a nonparametric statistical model. This makes the approach lightweight and workload independent. This study shows that AUDIBLE achieves high system utilization while enforcing stringent requirements on server capacity violations.

Complete list of accepted publications by Microsoft researchers

Amanda: Unified Instrumentation Framework for Deep Neural Networks
Yue Guan, Yuxian Qiu, and Jingwen Leng; Fan Yang, Microsoft Research; Shuo Yu, Shanghai Jiao Tong University; Yunxin Liu, Tsinghua University; Yu Feng and Yuhao Zhu, University of Rochester; Lidong Zhou, Microsoft Research; Yun Liang, Peking University; Chen Zhang, Chao Li, and Minyi Guo, Shanghai Jiao Tong University

AUDIBLE: A Convolution-Based Resource Allocator for Oversubscribing Burstable Virtual Machines
Seyedali Jokar Jandaghi and Kaveh Mahdaviani, University of Toronto; Amirhossein Mirhosseini, University of Michigan; Sameh Elnikety, Microsoft Research; Cristiana Amza and Bianca Schroeder, University of Toronto, Cristiana Amza and Bianca Schroeder, University of Toronto

Characterizing Power Management Opportunities for LLMs in the Cloud
(opens in new tab)Pratyush Patel, Microsoft Azure and University of Washington; Esha Choukse (opens in new tab), Chaojie Zhang (opens in new tab), and Íñigo Goiri (opens in new tab), Azure Research; Brijesh Warrier (opens in new tab), Nithish Mahalingam, Ricardo Bianchini (opens in new tab), Microsoft AzureResearch

Cornucopia Reloaded: Load Barriers for CHERI Heap Temporal Safety
Nathaniel Wesley Filardo, University of Cambridge and Microsoft Research; Brett F. Gutstein, Jonathan Woodruff, Jessica Clarke, and Peter Rugg, University of Cambridge; Brooks Davis, SRI International; Mark Johnston, University of Cambridge; Robert Norton, Microsoft Research; David Chisnall, SCI Semiconductor; Simon W. Moore, University of Cambridge; Peter G. Neumann, SRI International; Robert N. M. Watson, University of Cambridge

CrossPrefetch: Accelerating I/O Prefetching for Modern Storage
Shaleen Garg and Jian Zhang, Rutgers University; Rekha Pitchumani, Samsung; Manish Parashar, University of Utah; Bing Xie, Microsoft; Sudarsun Kannan, Rutgers University

Kimbap: A Node-Property Map System for Distributed Graph Analytics
Hochan Lee, University of Texas at Austin; Roshan Dathathri, Microsoft Research; Keshav Pingali, University of Texas at Austin

PIM-DL: Expanding the Applicability of Commodity DRAM-PIMs for Deep Learning via Algorithm-System Co-Optimization
Cong Li and Zhe Zhou, Peking University; Yang Wang, Microsoft Research; Fan Yang, Nankai University; Ting Cao and Mao Yang, Microsoft Research; Yun Liang and Guangyu Sun, Peking University

Predict; Don’t React for Enabling Efficient Fine-Grain DVFS in GPUs
Srikant Bharadwaj, Microsoft Research; Shomit Das, Qualcomm; Kaushik Mazumdar and Bradford M. Beckmann, AMD; Stephen Kosonocky, Uhnder

Conference organizers from Microsoft

Career opportunities

Microsoft welcomes talented individuals across various roles at Microsoft Research, Azure Research, and other departments. We are always pushing the boundaries of computer systems to improve the scale, efficiency, and security of all our offerings. You can review our open research-related positions here.

认识作者

Rodrigo Fonseca

Sr Principal Research Manager

了解更多

Madan Musuvathi

Partner Research Manager

了解更多

继续阅读

2024年11月1日

微软研究院博客

Microsoft at ASPLOS 2024: Advancing hardware and software for high-scale, secure, and efficient modern applications

Paper highlights

Characterizing Power Management Opportunities for LLMs in the Cloud

PIM-DL: Expanding the Applicability of Commodity DRAM-PIMs for Deep Learning via Algorithm-System Co-Optimization

Cornucopia Reloaded: Load Barriers for CHERI Heap Temporal Safety

AUDIBLE: A Convolution-Based Resource Allocator for Oversubscribing Burstable Virtual Machines

Complete list of accepted publications by Microsoft researchers

Conference organizers from Microsoft

Program Co-Chair

Submission Chairs

Program Committee

External Review Committee

Career opportunities

相关论文与出版物

CrossPrefetch: Accelerating I/O Prefetching for Modern Storage

AUDIBLE: A Convolution-Based Resource Allocator for Oversubscribing Burstable Virtual Machines

Kimbap: A Node-Property Map System for Distributed Graph Analytics

Predict; Don’t React for Enabling Efficient Fine-Grain DVFS in GPUs

Cornucopia Reloaded: Load Barriers for CHERI Heap Temporal Safety

Characterizing Power Management Opportunities for LLMs in the Cloud

Amanda: Unified Instrumentation Framework for Deep Neural Networks

PIM-DL: Expanding the Applicability of Commodity DRAM-PIMs for Deep Learning via Algorithm-System Co-Optimization

认识作者

Rodrigo Fonseca

Madan Musuvathi

继续阅读

Research Focus: Week of October 28, 2024

Research Focus: Week of September 9, 2024

Research Focus: Week of April 15, 2024

Research at Microsoft 2023: A year of groundbreaking AI advances and discoveries

研究领域

微软研究院博客

MedFuzz: Exploring the robustness of LLMs on medical challenge problems

Paper highlights

Complete list of accepted publications by Microsoft researchers

Conference organizers from Microsoft

Program Co-Chair

Submission Chairs

Program Committee

External Review Committee

Career opportunities

相关论文与出版物

认识作者

Rodrigo Fonseca

Madan Musuvathi

继续阅读

研究领域