Online and Stochastic Gradient Methods for Non-decomposable Loss Functions
- Purushottam Kar ,
- Harikrishna Narasimhan ,
- Prateek Jain
Proceedings of the 28th Annual Conference on Neural Information Processing Systems (NIPS) |
Published by Neural Information Processing Systems
Modern applications in sensitive domains such as biometrics and medicine frequently require the use of non-decomposable loss functions such as precision@k, F-measure etc. Compared to point loss functions such as hinge-loss, these offer much more fine grained control over prediction, but at the same time present novel challenges in terms of algorithm design and analysis. In this work we initiate a study of online learning techniques for such non-decomposable loss functions with an aim to enable incremental learning as well as design scalable solvers for batch problems. To this end, we propose an online learning framework for such loss functions. Our model enjoys several nice properties, chief amongst them being the existence of efficient online learning algorithms with sublinear regret and online to batch conversion bounds. Our model is a provable extension of existing online learning models for point loss functions. We instantiate two popular losses, namely precision@k and partial AUC in our model and prove sublinear regret bounds for both of them. Our proofs require a novel structural lemma over ranked lists which may be of independent interest. We then develop scalable stochastic gradient descent solvers for non-decomposable loss functions. We show that for loss functions satisfying a certain uniform convergence property (that includes precision@k and partial AUC), our methods provably converge to the empirical risk minimizer. We use extensive experimentation on real life and benchmark datasets to establish that our method can be orders of magnitude faster than a recently proposed cutting plane method.