Foundations and Trends (R) in Robotics Series

A Survey on Policy Search for Robotics

by Marc Peter Deisenroth, Gerhard Neumann, and Jan Peters

Published 30 August 2013

Policy search is a subfield of Reinforcement Learning (RL) that focuses on finding good parameters for a given policy parameterization. It is well suited tor robotics as it can cope with high-dimensional state and action spaces, which is one of the main challenges in robot learning.

A Survey on Policy Search for Robotics reviews recent successes of both model-free and model-based policy search in robot learning. Model-free policy search is a general approach to learn policies based on sampled trajectories. This text classifies model-free methods based on their policy evaluation, policy update, and exploration strategies, and presents a unified view of existing algorithms. Learning a policy is often easier than learning an accurate forward model, and, hence, model-free methods are more frequently used in practice. However, for each sampled trajectory, it is necessary to interact with the robot, which can be time consuming and challenging in practice.

Model-based policy search addresses this problem by first learning a simulator of the robot's dynamics from data. Subsequently, the simulator generates trajectories that are used for policy learning. For both model-free and model-based policy search methods, A Survey on Policy Search for Robotics reviews their respective properties and their applicability to robotic systems. It is an invaluable reference for anyone working in the area.