Optimizing GPU Data Center Power
- Tawfik Rahal-Arabi ,
- Paul Van der Arend ,
- Ashish Jain ,
- Mehdi Saidi ,
- Rashad Oreifej ,
- Sriram Sundaram ,
- Srilatha Manne ,
- Indrani Paul ,
- Rajit Seahra ,
- Frank Helms ,
- Esha Choukse ,
- Nithish Mahalingam ,
- Brijesh Warrier ,
- Ricardo Bianchini
APCCAS |
Published by IEEE | Organized by IEEE
GPUs are used in products from ultra-low power mobile devices to high performance machine learning accelerators in data centers. Across the products, power and power delivery have become top limiters to performance and are key considerations in the early stages of product definition and design. In particular, the power and power delivery problem has been significantly exacerbated with the recent trends in the growth of AI workloads. In this joint AMD and Microsoft paper, we present some of the power optimizations used in latest generation of AMD GPUs including the recently announced AMD Instinct™ MI300 GPU. To this end, we cover power and power delivery optimization techniques spanning the product life cycle from architecture, physical design, validation, test, manufacturing and conclude with a data center scale view of the challenges ahead to power optimize the GPUs for the data centers of the future.