The pure exploration problem with general reward functions depending on full distributions
Machine Learning |
In this paper, we study the pure exploration model with general distribution functions, which means that the reward function of each arm depends on the whole distribution, not only its mean. We adapt the elimination framework and the LUCB framework to solve this problem, and design algorithms for estimating the value of the reward functions with different types of distributions. Then we show that our estimation methods have the correctness guarantee with proper parameters, and obtain sample complexity upper bounds for them. The order of the complexity upper bounds matches the complexity lower bound when the error constraint