GPU Occupancy Prediction of Deep Learning Models Using Graph Neural Network
- Hengquan Mei ,
- Huaizhi Qu ,
- Jingwei Sun ,
- Yanjie Gao ,
- Haoxiang Lin ,
- Guangzhong Sun
Published by IEEE
The 25th IEEE International Conference on Cluster Computing
Over the past few years, deep learning has been rapidly adopted in many fields. Among the various hardware accelerators specifically for deep learning computation, graphics processing units (GPUs) are mainly used. GPU occupancy–the average ratio of active warps to maximum supported warps on all streaming multiprocessors–is an essential indicator of how well GPUs are utilized. Predicting the GPU occupancy of deep learning models is critical for boosting both job runtime performance and platform resource efficiency. However, GPU occupancy prediction is challenging due to the complex factors hidden in framework runtimes and diverse architectures and hyperparameters of models. In this paper, we propose DNN-occu to predict the GPU occupancy of deep learning models. Our key observation is that models can be represented as directed acyclic computation graphs. DNN-occu extracts a set of occupancy-related features from the computational semantics of the graph nodes and edges. It also employs a novel graph neural network for better feature encoding and prediction generalization. The experiments on various configurations of real-world deep learning models show that DNN-occu achieves high accuracy for occupancy prediction (with an overall error of 9.271%) and has a strong generalization ability for unseen models. In addition, we apply DNN-occu in a trace-driven simulation of deep learning workload scheduling and achieve up to a 31.45% increase in overall GPU utilization and a 19.71% reduction in makespan.