VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

Luisa Zintgraf; Kyriacos Shiarlis; Maximilian Igl; Sebastian Schulze; Yarin Gal; Katja Hofmann; Shimon Whiteson

VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

Luisa Zintgraf ,
Kyriacos Shiarlis ,
Maximilian Igl ,
Sebastian Schulze ,
Yarin Gal ,
Katja Hofmann ,
Shimon Whiteson

Eighth International Conference on Learning Representations (ICLR) | April 2020

Download BibTex

Trading off exploration and exploitation in an unknown environment is key to maximising expected return during learning. A Bayes-optimal policy, which does so optimally, conditions its actions not only on the environment state but on the agent’s uncertainty about the environment. Computing a Bayes-optimal policy is however intractable for all but the smallest tasks. In this paper, we introduce variational Bayes-Adaptive Deep RL (variBAD), a way to meta-learn to perform approximate inference in an unknown environment, and incorporate task uncertainty directly during action selection. In a grid-world domain, we illustrate how variBAD performs structured online exploration as a function of task uncertainty. We also evaluate variBAD on MuJoCo domains widely used in meta-RL and show that it achieves higher return during training than existing methods.