Exact Exponent in Optimal Rates for Crowdsourcing
- Chao Gao ,
- Yu Lu ,
- Denny Zhou
Proceedings of the 33rd International Conference on Machine Learning |
Crowdsourcing has become a popular tool for labeling large datasets. This paper studies the optimal error rate for aggregating crowdsourced labels provided by a collection of amateur workers. Under the Dawid-Skene probabilistic model, we establish matching upper and lower bounds with an exact exponent mI(π), where m is the number of workers and I(π) is the average Chernoff information that characterizes the workers’ collective ability. Such an exact characterization of the error exponent allows us to state a precise sample size requirement m≥1/I(π) log(1/ϵ) in order to achieve an ϵ misclassification error. In addition, our results imply optimality of various forms of EM algorithms given accurate initializers of the model parameters