Header logo is am


2010


no image
Learning optimal control solutions: a path integral approach

Theodorou, E., Schaal, S.

In Abstracts of Neural Control of Movement Conference (NCM 2010), Naples, Florida, 2010, 2010, clmc (inproceedings)

Abstract
Investigating principles of human motor control in the framework of optimal control has had a long tradition in neural control of movement, and has recently experienced a new surge of investigations. Ideally, optimal control problems are addresses as a reinforcement learning (RL) problem, which would allow to investigate both the process of acquiring an optimal control solution as well as the solution itself. Unfortunately, the applicability of RL to complex neural and biomechanics systems has been largely impossible so far due to the computational difficulties that arise in high dimensional continuous state-action spaces. As a way out, research has focussed on computing optimal control solutions based on iterative optimal control methods that are based on linear and quadratic approximations of dynamical models and cost functions. These methods require perfect knowledge of the dynamics and cost functions while they are based on gradient and Newton optimization schemes. Their applicability is also restricted to low dimensional problems due to problematic convergence in high dimensions. Moreover, the process of computing the optimal solution is removed from the learning process that might be plausible in biology. In this work, we present a new reinforcement learning method for learning optimal control solutions or motor control. This method, based on the framework of stochastic optimal control with path integrals, has a very solid theoretical foundation, while resulting in surprisingly simple learning algorithms. It is also possible to apply this approach without knowledge of the system model, and to use a wide variety of complex nonlinear cost functions for optimization. We illustrate the theoretical properties of this approach and its applicability to learning motor control tasks for reaching movements and locomotion studies. We discuss its applicability to learning desired trajectories, variable stiffness control (co-contraction), and parameterized control policies. We also investigate the applicability to signal dependent noise control systems. We believe that the suggested method offers one of the easiest to use approaches to learning optimal control suggested in the literature so far, which makes it ideally suited for computational investigations of biological motor control.

[BibTex]

2010

[BibTex]


no image
Constrained Accelerations for Controlled Geometric Reduction: Sagittal-Plane Decoupling for Bipedal Locomotion

Gregg, R., Righetti, L., Buchli, J., Schaal, S.

In 2010 10th IEEE-RAS International Conference on Humanoid Robots, pages: 1-7, IEEE, Nashville, USA, 2010 (inproceedings)

Abstract
Energy-shaping control methods have produced strong theoretical results for asymptotically stable 3D bipedal dynamic walking in the literature. In particular, geometric controlled reduction exploits robot symmetries to control momentum conservation laws that decouple the sagittal-plane dynamics, which are easier to stabilize. However, the associated control laws require high-dimensional matrix inverses multiplied with complicated energy-shaping terms, often making these control theories difficult to apply to highly-redundant humanoid robots. This paper presents a first step towards the application of energy-shaping methods on real robots by casting controlled reduction into a framework of constrained accelerations for inverse dynamics control. By representing momentum conservation laws as constraints in acceleration space, we construct a general expression for desired joint accelerations that render the constraint surface invariant. By appropriately choosing an orthogonal projection, we show that the unconstrained (reduced) dynamics are decoupled from the constrained dynamics. Any acceleration-based controller can then be used to stabilize this planar subsystem, including passivity-based methods. The resulting control law is surprisingly simple and represents a practical way to employ control theoretic stability results in robotic platforms. Simulated walking of a 3D compass-gait biped show correspondence between the new and original controllers, and simulated motions of a 16-DOF humanoid demonstrate the applicability of this method.

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Variable impedance control - a reinforcement learning approach

Buchli, J., Theodorou, E., Stulp, F., Schaal, S.

In Robotics Science and Systems (2010), Zaragoza, Spain, June 27-30, 2010, clmc (inproceedings)

Abstract
One of the hallmarks of the performance, versatility, and robustness of biological motor control is the ability to adapt the impedance of the overall biomechanical system to different task requirements and stochastic disturbances. A transfer of this principle to robotics is desirable, for instance to enable robots to work robustly and safely in everyday human environments. It is, however, not trivial to derive variable impedance controllers for practical high DOF robotic tasks. In this contribution, we accomplish such gain scheduling with a reinforcement learning approach algorithm, PI2 (Policy Improvement with Path Integrals). PI2 is a model-free, sampling based learning method derived from first principles of optimal control. The PI2 algorithm requires no tuning of algorithmic parameters besides the exploration noise. The designer can thus fully focus on cost function design to specify the task. From the viewpoint of robotics, a particular useful property of PI2 is that it can scale to problems of many DOFs, so that RL on real robotic systems becomes feasible. We sketch the PI2 algorithm and its theoretical properties, and how it is applied to gain scheduling. We evaluate our approach by presenting results on two different simulated robotic systems, a 3-DOF Phantom Premium Robot and a 6-DOF Kuka Lightweight Robot. We investigate tasks where the optimal strategy requires both tuning of the impedance of the end-effector, and tuning of a reference trajectory. The results show that we can use path integral based RL not only for planning but also to derive variable gain feedback controllers in realistic scenarios. Thus, the power of variable impedance control is made available to a wide variety of robotic systems and practical applications.

link (url) [BibTex]

link (url) [BibTex]


no image
Inverse dynamics with optimal distribution of ground reaction forces for legged robot

Righetti, L., Buchli, J., Mistry, M., Schaal, S.

In Proceedings of the 13th International Conference on Climbing and Walking Robots (CLAWAR), pages: 580-587, Nagoya, Japan, sep 2010 (inproceedings)

Abstract
Contact interaction with the environment is crucial in the design of locomotion controllers for legged robots, to prevent slipping for example. Therefore, it is of great importance to be able to control the effects of the robots movements on the contact reaction forces. In this contribution, we extend a recent inverse dynamics algorithm for floating base robots to optimize the distribution of contact forces while achieving precise trajectory tracking. The resulting controller is algorithmically simple as compared to other approaches. Numerical simulations show that this result significantly increases the range of possible movements of a humanoid robot as compared to the previous inverse dynamics algorithm. We also present a simplification of the result where no inversion of the inertia matrix is needed which is particularly relevant for practical use on a real robot. Such an algorithm becomes interesting for agile locomotion of robots on difficult terrains where the contacts with the environment are critical, such as walking over rough or slippery terrain.

DOI [BibTex]

DOI [BibTex]