Hardware Acceleration for DBMS Machine Learning Scoring: Is It Worth the Overheads?

International Symposium on Performance Analysis of Systems and Software (ISPASS) |

Published by IEEE

Query processing for data analytics with machine learning scoring involves executing heterogeneous operations in a pipelined fashion. Hardware acceleration is one approach to improve the pipeline performance and free up processor resources by offloading computations to the accelerators. However, the performance benefits of accelerators can be limited by the compute and data offloading overheads. Although prior works have studied acceleration opportunities, including with accelerators for machine learning operations, an end-to-end application performance analysis has not been well studied, particularly for data analytics and model scoring pipelines. In this paper, we study speedups and overheads of using PCIe-based hardware accelerators in such pipelines. In particular, we analyze the effectiveness of using GPUs and FPGAs to accelerate scoring for random forest, a popular machine learning model, on tabular input data obtained from Microsoft SQL Server. We observe that the offloading decision as well as the choice of the optimal hardware backend should depend at least on the model complexity (e.g., number of features and tree depth), the scoring data size, and the overheads associated with data movement and invocation of the pipeline stages. We also highlight potential future research explorations based on our findings.