Regressing Deterministic Plans for MDP Function Approximation
- Andrey Kolobov ,
- Mausam ,
- Daniel S. Weld
ICAPS 2008 Workshop on A Reality Check for Planning and Scheduling Under Uncertainty |
Most dynamic programming methods for solving MDPs store the value function explicitly as a table of state-value pairs. However, since the size of the state space is exponential in the number of state variables, these methods often run out of memory even on medium-sized problems. In response, researchers have suggested two ways to learn a compact approximation of the value function, either by combining a set of basis functions or by abstraction. While these methods have proved successful in continuous domains and in a few logically-specified domains with regular structure, noone has successfully applied them to probabilistic IPC domains.
This paper fuses these two paradigms into a novel valuefunction approximation scheme, which enables the solution of large problems from the Competition. Our method, RETRASE, has three steps. First, we use a fast classical planner, such as LPG-d, to generate a set of diverse plans for a determinized version of the probabilistic domain. Next, we regress these plans to generate a small set of basis functions, which induce overlapping neighborhoods over the state space. Third, we use a modified RTDP procedure to learn weights for each basis function (not for the whole space of reachable states). Finally, an agent can execute in the domain by using applicable basis functions to estimate the value of destination states and choose a good policy. Preliminary experiments show that RETRASE performs well on hard problems that challenge other planners.