Runtime Performance Prediction for Deep Learning Models with Graph Neural Network (Tech Report)

MSR-TR-2021-3 |

Published by Microsoft

Recently, deep learning (DL) has been widely adopted in many application domains. Predicting the runtime performance of DL models such as GPU memory consumption and training time is important to boost development productivity and reduce resource waste because improper configurations of hyperparameters and neural architectures can result in many failed training jobs or inappropriate models. However, general runtime performance prediction for DL models is challenging due to the hybrid DL programming paradigm, complicated hidden factors within the framework runtime, fairly huge model configuration space, and wide differences among models. In this paper, we propose DNNPerf, a novel and general machine learning approach to predict the runtime performance of DL models using Graph Neural Network. DNNPerf represents a DL model as a directed acyclic computation graph and designs a rich set of effective performance-related features based on the computational semantics of both nodes and edges. We also propose a new Attention-based Node-Edge Encoder to better encode the node and edge features. DNNPerf is extensively evaluated on thousands of configurations of real-world and synthetic DL models to predict their GPU memory consumption and training time. The experimental results demonstrate that DNNPerf achieves an overall error of 13.684% for the GPU memory consumption prediction and an overall error of 7.443% for the training time prediction, outperforming all the compared methods.