Bandit Multi-linear DR-Submodular Maximization and Its Applications on Adversarial Submodular Bandits
- Zongqi Wan ,
- Jialin Zhang ,
- Wei Chen ,
- Xiaoming Sun ,
- Zhijie Zhang
Proceedings of the 40th International Conference on Machine Learning (ICML) |
We investigate the online bandit learning of the monotone multi-linear DR-submodular functions, designing the algorithm \(BanditMLSM\) that attains \(O(T^{2/3}\log T)\) of \((1−1/e)\)-regret. Then we reduce submodular bandit with partition matroid constraint and bandit sequential monotone maximization to the online bandit learning of the monotone multi-linear DR-submodular functions, attaining \(O(T^{2/3}\log T)\) of \((1−1/e)\)-regret in both problems, which improve the existing results. To the best of our knowledge, we are the first to give a sublinear regret algorithm for the submodular bandit with partition matroid constraint. A special case of this problem is studied by Streeter et al.(2009). They prove a \(O(T^{4/5})\) \((1−1/e)\)-regret upper bound. For the bandit sequential submodular maximization, the existing work proves an \(O(T^{2/3})\) regret with a suboptimal \(1/2\) approximation ratio (Niazadeh et al. 2021).