Postponed Updates for Temporal-Difference Reinforcement Learning
- Harm van Seijen ,
- Shimon Whiteson
Ninth International Conference on Intelligent Systems Design and Applications, ISDA 2009, Pisa, Italy |
This paper presents postponed updates, a new strategy for TD methods,that can improve sample efficiency with- out incurring the computational and space requirements of model-based RL. By recording the agent’s last-visit experi- ence, the agent can delay its update until the given state is revisited, thereby improving the quality of the update. Experimental results demonstrate,that postponed,updates outperforms several competitors, most notably eligibility traces, a traditional way to improve the sample efficiency of TD methods. It achieves this without the need to tune an extra parameter as is needed for eligibility traces.