Powering Multi-Task Federated Learning with Competitive GPU Resource Sharing
- Yongbo Yu ,
- Fuxun Yu ,
- Zirui Xu ,
- Di Wang ,
- Minjia Zhang ,
- Ang Li ,
- Shawn Bray ,
- Chenchen Liu ,
- Xiang Chen
Federated learning has been applied to train different tasks, posing new computation challenges in training, especially when the scenario becomes multi-task. In this paper, we profile the FL multi-task training process at the operator-level to identify and solve the problems in FL multi-task training. Second, we propose a Competitive GPU Resource Sharing method that can efficiently partition GPU resources to improve training efficiency. Third, for the imbalanced data problem in FL with multi-device training, we perform GPU resource partitioning according to the workload of different models. Experiments show that our method can obtain a 2.1 times speedup.