Ph.D. Defense

Multi-Source Option-Based Policy Transfer

James MacGlashan

10:00am Friday, 25 January 2013, ITE 325B

 

Reinforcement learning algorithms are very effective at learning policies (mappings from states to actions) for specific well defined tasks, thereby allowing an agent to learn how to behave without extensive deliberation.  However, if an agent must complete a novel variant of a task that is similar to, but not exactly the same as, a previous version for which it has already learned a policy, learning must begin anew and there is no benefit to having previously learned anything. To address this challenge, I introduce novel approaches for policy transfer. Policy transfer allows the agent to follow the policy of a previously solved, but different, task (called a source task) while it is learning a new task (called a target task). Specifically, I introduce option-based policy transfer (OPT). OPT enables policy transfer by encapsulating the policy for a source task in an option (Sutton, Precup, & Singh 1999), which allows the agent to treat the policy of a source task as if it were a primitive action. A significant advantage of this approach is that if there are multiple source tasks, an option can be created for each of them, thereby enabling the agent to transfer knowledge from multiple sources and to combine their knowledge in useful ways. Moreover, this approach allows the agent to learn in which states of the world each source task is most applicable. OPT's approach to constructing and learning with options that represent source tasks allows OPT to greatly outperform existing policy transfer approaches. Additionally, OPT can utilize source tasks that other forms of transfer learning for reinforcement learning cannot.

Challenges for policy transfer include identifying sets of source tasks that would be useful for a target task and providing mappings between the state and action spaces of source and target tasks. That is, it may not be useful to transfer from all previously solved source tasks. If a source task has a different state or action space than the target task, then a mapping between these spaces must be provided. To address these challenges, I introduce object-oriented OPT (OO-OPT), which leverages object-oriented MDP (OO-MDP) (Diuk, Cohen, & Littman 2008) state representations to automatically detect related tasks and redundant source tasks, and to provide multiple useful state and action space mappings between tasks. I also introduce methods to adapt value function approximation techniques (which are useful when the state space of a task is very large or continuous) to the unique state representation of OO-MDPs.

Committee: Dr. Marie desJardins (Chair), Dr. Tim Finin, Dr. Michael Littman, Dr. Tim Oates, Dr. Yun Peng