Optimizing GPU Data Center Power

APCCAS |

Published by IEEE | Organized by IEEE

GPUs are used in products from ultra-low power mobile devices to high performance machine learning accelerators in data centers. Across the products, power and power delivery have become top limiters to performance and are key considerations in the early stages of product definition and design. In particular, the power and power delivery problem has been significantly exacerbated with the recent trends in the growth of AI workloads. In this joint AMD and Microsoft paper, we present some of the power optimizations used in latest generation of AMD GPUs including the recently announced AMD Instinct™ MI300 GPU. To this end, we cover power and power delivery optimization techniques spanning the product life cycle from architecture, physical design, validation, test, manufacturing and conclude with a data center scale view of the challenges ahead to power optimize the GPUs for the data centers of the future.