Counter-Factual Reinforcement Learning: How to Model Decision-Makers That Anticipate the Future

Ritchie Lee; David Wolpert; Scott Backhaus; Russell Bent; Brendan Tracey

Counter-Factual Reinforcement Learning: How to Model Decision-Makers That Anticipate the Future

Ritchie Lee ,
David Wolpert ,
Scott Backhaus ,
Russell Bent ,
Brendan Tracey

Chapter 4, in Decision Making and Imperfection

Published by Springer | 2013, Vol 474

ISBN: 978-3-642-36406-8

Download BibTex

This chapter introduces a novel framework for modeling interacting humans in a multi-stage game. This ”iterated semi network-form game” framework has the following desirable characteristics: (1) Bounded rational players, (2) strategic players (i.e., players account for one another’s reward functions when predicting one another’s behavior), and (3) computational tractability even on real-world systems. We achieve these benefits by combining concepts from game theory and reinforcement learning. To be precise, we extend the bounded rational ”level-K reasoning” model to apply to games over multiple stages. Our extension allows the decomposition of the overall modeling problem into a series of smaller ones, each of which can be solved by standard reinforcement learning algorithms. We call this hybrid approach ”level-K reinforcement learning”. We investigate these ideas in a cyber battle scenario over a smart power grid and discuss the relationship between the behavior predicted by our model and what one might expect of real human defenders and attackers.