Autonomous Motion
Note: This department has relocated.

Scaling Reinforcement Learning Paradigms for Motor Control

2003

Conference Paper

ei


Reinforcement learning offers a general framework to explain reward related learning in artificial and biological motor control. However, current reinforcement learning methods rarely scale to high dimensional movement systems and mainly operate in discrete, low dimensional domains like game-playing, artificial toy problems, etc. This drawback makes them unsuitable for application to human or bio-mimetic motor control. In this poster, we look at promising approaches that can potentially scale and suggest a novel formulation of the actor-critic algorithm which takes steps towards alleviating the current shortcomings. We argue that methods based on greedy policies are not likely to scale into high-dimensional domains as they are problematic when used with function approximation – a must when dealing with continuous domains. We adopt the path of direct policy gradient based policy improvements since they avoid the problems of unstabilizing dynamics encountered in traditional value iteration based updates. While regular policy gradient methods have demonstrated promising results in the domain of humanoid notor control, we demonstrate that these methods can be significantly improved using the natural policy gradient instead of the regular policy gradient. Based on this, it is proved that Kakade’s ‘average natural policy gradient’ is indeed the true natural gradient. A general algorithm for estimating the natural gradient, the Natural Actor-Critic algorithm, is introduced. This algorithm converges with probability one to the nearest local minimum in Riemannian space of the cost function. The algorithm outperforms nonnatural policy gradients by far in a cart-pole balancing evaluation, and offers a promising route for the development of reinforcement learning for truly high-dimensionally continuous state-action systems. Keywords: Reinforcement learning, neurodynamic programming, actorcritic methods, policy gradient methods, natural policy gradient

Author(s): Peters, J. and Vijayakumar, S. and Schaal, S.
Book Title: JSNC 2003
Journal: Proceedings of the 10th Joint Symposium on Neural Computation (JSNC 2003)
Volume: 10
Pages: 1-7
Year: 2003
Month: May
Day: 0

Department(s): Empirical Inference
Bibtex Type: Conference Paper (inproceedings)

Event Name: 10th Joint Symposium on Neural Computation (JSNC 2003)
Event Place: Irvine, CA, USA

Digital: 0
Language: en
Organization: Max-Planck-Gesellschaft
School: Biologische Kybernetik

Links: PDF
Web

BibTex

@inproceedings{5058,
  title = {Scaling Reinforcement Learning Paradigms for Motor Control},
  author = {Peters, J. and Vijayakumar, S. and Schaal, S.},
  journal = {Proceedings of the 10th Joint Symposium on Neural Computation (JSNC 2003)},
  booktitle = {JSNC 2003},
  volume = {10},
  pages = {1-7},
  organization = {Max-Planck-Gesellschaft},
  school = {Biologische Kybernetik},
  month = may,
  year = {2003},
  doi = {},
  month_numeric = {5}
}