Contextual Bandits with Linear Payoﬀ Functions

Wei Chu; Lihong Li; Lev Reyzin; Robert E. Schapire

Contextual Bandits with Linear Payoﬀ Functions

Wei Chu ,
Lihong Li ,
Lev Reyzin ,
Robert E. Schapire

AISTATS 2011 | April 2011

Organized by Artiﬁcial Intelligence and Statistics

下载 BibTex

In this paper we study the contextual bandit problem (also known as the multi-armed bandit problem with expert advice) for linear payoﬀ functions. For T rounds, K actions, and d dimensional feature vectors, we prove an OqTdln3(KT ln(T)/δ)regret bound that holds with probability 1−δ for the simplest known (both conceptually and computationally) eﬃcient upper conﬁdence bound algorithm for this problem. We also prove a lower bound of Ω(√Td) for this setting, matching the upper bound up to logarithmic factors.