Optimistic Knowledge Gradient Policy for Optimal Budget Allocation in Crowdsourcing
- Xi Chen ,
- Qihang Lin ,
- Denny Zhou
Proceedings of the 30th International Conference on Machine Learning (ICML) |
Published by JMLR W&CP
We consider the budget allocation problem in binary/multi-class crowd labeling where each label from the crowd has a certain cost. Since different instances have different ambiguities and different workers have different reliabilities, a fundamental challenge is how to allocate a pre-fixed amount of budget among instance-worker pairs so that the overall accuracy can be maximized. We start with a simple setting where all workers are assumed to be noiseless and identical and formulate the problem as a Bayesian Markov Decision Process (MDP). Using the dynamic programming (DP) algorithm, one can obtain the optimal allocation policy for any given budget. However, DP is computationally intractable. To address the computational challenge, we propose a new approximate policy, called optimistic knowledge gradient. The consistency of the proposed policy is established. Then we extend the MDP framework to incorporate the estimation of workers’ reliabilities into the allocation process. We conduct simulated and real experiments to demonstrate the superiority of our policy in different crowd labeling tasks.