Header logo is am


2010


no image
Reinforcement learning of full-body humanoid motor skills

Stulp, F., Buchli, J., Theodorou, E., Schaal, S.

In Humanoid Robots (Humanoids), 2010 10th IEEE-RAS International Conference on, pages: 405-410, December 2010, clmc (inproceedings)

Abstract
Applying reinforcement learning to humanoid robots is challenging because humanoids have a large number of degrees of freedom and state and action spaces are continuous. Thus, most reinforcement learning algorithms would become computationally infeasible and require a prohibitive amount of trials to explore such high-dimensional spaces. In this paper, we present a probabilistic reinforcement learning approach, which is derived from the framework of stochastic optimal control and path integrals. The algorithm, called Policy Improvement with Path Integrals (PI2), has a surprisingly simple form, has no open tuning parameters besides the exploration noise, is model-free, and performs numerically robustly in high dimensional learning problems. We demonstrate how PI2 is able to learn full-body motor skills on a 34-DOF humanoid robot. To demonstrate the generality of our approach, we also apply PI2 in the context of variable impedance control, where both planned trajectories and gain schedules for each joint are optimized simultaneously.

link (url) [BibTex]

2010

link (url) [BibTex]


Thumb xl screen shot 2015 08 23 at 15.52.25
Enhanced Visual Scene Understanding through Human-Robot Dialog

Johnson-Roberson, M., Bohg, J., Kragic, D., Skantze, G., Gustafson, J., Carlson, R.

In Proceedings of AAAI 2010 Fall Symposium: Dialog with Robots, November 2010 (inproceedings)

pdf [BibTex]

pdf [BibTex]


Thumb xl screen shot 2015 08 23 at 15.18.17
Scene Representation and Object Grasping Using Active Vision

Gratal, X., Bohg, J., Björkman, M., Kragic, D.

In IROS’10 Workshop on Defining and Solving Realistic Perception Problems in Personal Robotics, October 2010 (inproceedings)

Abstract
Object grasping and manipulation pose major challenges for perception and control and require rich interaction between these two fields. In this paper, we concentrate on the plethora of perceptual problems that have to be solved before a robot can be moved in a controlled way to pick up an object. A vision system is presented that integrates a number of different computational processes, e.g. attention, segmentation, recognition or reconstruction to incrementally build up a representation of the scene suitable for grasping and manipulation of objects. Our vision system is equipped with an active robotic head and a robot arm. This embodiment enables the robot to perform a number of different actions like saccading, fixating, and grasping. By applying these actions, the robot can incrementally build a scene representation and use it for interaction. We demonstrate our system in a scenario for picking up known objects from a table top. We also show the system’s extendibility towards grasping of unknown and familiar objects.

video pdf slides [BibTex]

video pdf slides [BibTex]


Thumb xl after250measurementprmgoodlinespec
Strategies for multi-modal scene exploration

Bohg, J., Johnson-Roberson, M., Björkman, M., Kragic, D.

In Intelligent Robots and Systems (IROS), 2010 IEEE/RSJ International Conference on, pages: 4509-4515, October 2010 (inproceedings)

Abstract
We propose a method for multi-modal scene exploration where initial object hypothesis formed by active visual segmentation are confirmed and augmented through haptic exploration with a robotic arm. We update the current belief about the state of the map with the detection results and predict yet unknown parts of the map with a Gaussian Process. We show that through the integration of different sensor modalities, we achieve a more complete scene model. We also show that the prediction of the scene structure leads to a valid scene representation even if the map is not fully traversed. Furthermore, we propose different exploration strategies and evaluate them both in simulation and on our robotic platform.

video pdf DOI Project Page [BibTex]

video pdf DOI Project Page [BibTex]


Thumb xl screen shot 2015 08 23 at 01.22.09
Attention-based active 3D point cloud segmentation

Johnson-Roberson, M., Bohg, J., Björkman, M., Kragic, D.

In Intelligent Robots and Systems (IROS), 2010 IEEE/RSJ International Conference on, pages: 1165-1170, October 2010 (inproceedings)

Abstract
In this paper we present a framework for the segmentation of multiple objects from a 3D point cloud. We extend traditional image segmentation techniques into a full 3D representation. The proposed technique relies on a state-of-the-art min-cut framework to perform a fully 3D global multi-class labeling in a principled manner. Thereby, we extend our previous work in which a single object was actively segmented from the background. We also examine several seeding methods to bootstrap the graphical model-based energy minimization and these methods are compared over challenging scenes. All results are generated on real-world data gathered with an active vision robotic head. We present quantitive results over aggregate sets as well as visual results on specific examples.

pdf DOI [BibTex]

pdf DOI [BibTex]


no image
Relative Entropy Policy Search

Peters, J., Mülling, K., Altun, Y.

In Proceedings of the Twenty-Fourth National Conference on Artificial Intelligence, pages: 1607-1612, (Editors: Fox, M. , D. Poole), AAAI Press, Menlo Park, CA, USA, Twenty-Fourth National Conference on Artificial Intelligence (AAAI-10), July 2010 (inproceedings)

Abstract
Policy search is a successful approach to reinforcement learning. However, policy improvements often result in the loss of information. Hence, it has been marred by premature convergence and implausible solutions. As first suggested in the context of covariant policy gradients (Bagnell and Schneider 2003), many of these problems may be addressed by constraining the information loss. In this paper, we continue this path of reasoning and suggest the Relative Entropy Policy Search (REPS) method. The resulting method differs significantly from previous policy gradient approaches and yields an exact update step. It works well on typical reinforcement learning benchmark problems.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Reinforcement learning of motor skills in high dimensions: A path integral approach

Theodorou, E., Buchli, J., Schaal, S.

In Robotics and Automation (ICRA), 2010 IEEE International Conference on, pages: 2397-2403, May 2010, clmc (inproceedings)

Abstract
Reinforcement learning (RL) is one of the most general approaches to learning control. Its applicability to complex motor systems, however, has been largely impossible so far due to the computational difficulties that reinforcement learning encounters in high dimensional continuous state-action spaces. In this paper, we derive a novel approach to RL for parameterized control policies based on the framework of stochastic optimal control with path integrals. While solidly grounded in optimal control theory and estimation theory, the update equations for learning are surprisingly simple and have no danger of numerical instabilities as neither matrix inversions nor gradient learning rates are required. Empirical evaluations demonstrate significant performance improvements over gradient-based policy learning and scalability to high-dimensional control problems. Finally, a learning experiment on a robot dog illustrates the functionality of our algorithm in a real-world scenario. We believe that our new algorithm, Policy Improvement with Path Integrals (PI2), offers currently one of the most efficient, numerically robust, and easy to implement algorithms for RL in robotics.

link (url) [BibTex]

link (url) [BibTex]


no image
Inverse dynamics control of floating base systems using orthogonal decomposition

Mistry, M., Buchli, J., Schaal, S.

In Robotics and Automation (ICRA), 2010 IEEE International Conference on, pages: 3406-3412, May 2010, clmc (inproceedings)

Abstract
Model-based control methods can be used to enable fast, dexterous, and compliant motion of robots without sacrificing control accuracy. However, implementing such techniques on floating base robots, e.g., humanoids and legged systems, is non-trivial due to under-actuation, dynamically changing constraints from the environment, and potentially closed loop kinematics. In this paper, we show how to compute the analytically correct inverse dynamics torques for model-based control of sufficiently constrained floating base rigid-body systems, such as humanoid robots with one or two feet in contact with the environment. While our previous inverse dynamics approach relied on an estimation of contact forces to compute an approximate inverse dynamics solution, here we present an analytically correct solution by using an orthogonal decomposition to project the robot dynamics onto a reduced dimensional space, independent of contact forces. We demonstrate the feasibility and robustness of our approach on a simulated floating base bipedal humanoid robot and an actual robot dog locomoting over rough terrain.

link (url) [BibTex]

link (url) [BibTex]


no image
Fast, robust quadruped locomotion over challenging terrain

Kalakrishnan, M., Buchli, J., Pastor, P., Mistry, M., Schaal, S.

In Robotics and Automation (ICRA), 2010 IEEE International Conference on, pages: 2665-2670, May 2010, clmc (inproceedings)

Abstract
We present a control architecture for fast quadruped locomotion over rough terrain. We approach the problem by decomposing it into many sub-systems, in which we apply state-of-the-art learning, planning, optimization and control techniques to achieve robust, fast locomotion. Unique features of our control strategy include: (1) a system that learns optimal foothold choices from expert demonstration using terrain templates, (2) a body trajectory optimizer based on the Zero-Moment Point (ZMP) stability criterion, and (3) a floating-base inverse dynamics controller that, in conjunction with force control, allows for robust, compliant locomotion over unperceived obstacles. We evaluate the performance of our controller by testing it on the LittleDog quadruped robot, over a wide variety of rough terrain of varying difficulty levels. We demonstrate the generalization ability of this controller by presenting test results from an independent external test team on terrains that have never been shown to us.

link (url) [BibTex]

link (url) [BibTex]


no image
Accelerometer-based Tilt Estimation of a Rigid Body with only Rotational Degrees of Freedom

Trimpe, S., D’Andrea, R.

In Proceedings of the IEEE International Conference on Robotics and Automation, 2010 (inproceedings)

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Locally weighted regression for control

Ting, J., Vijayakumar, S., Schaal, S.

In Encyclopedia of Machine Learning, pages: 613-624, (Editors: Sammut, C.;Webb, G. I.), Springer, 2010, clmc (inbook)

Abstract
This is article addresses two topics: learning control and locally weighted regression.

link (url) [BibTex]

link (url) [BibTex]


no image
Are reaching movements planned in kinematic or dynamic coordinates?

Ellmer, A., Schaal, S.

In Abstracts of Neural Control of Movement Conference (NCM 2010), Naples, Florida, 2010, 2010, clmc (inproceedings)

Abstract
Whether human reaching movements are planned and optimized in kinematic (task space) or dynamic (joint or muscle space) coordinates is still an issue of debate. The first hypothesis implies that a planner produces a desired end-effector position at each point in time during the reaching movement, whereas the latter hypothesis includes the dynamics of the muscular-skeletal control system to produce a continuous end-effector trajectory. Previous work by Wolpert et al (1995) showed that when subjects were led to believe that their straight reaching paths corresponded to curved paths as shown on a computer screen, participants adapted the true path of their hand such that they would visually perceive a straight line in visual space, despite that they actually produced a curved path. These results were interpreted as supporting the stance that reaching trajectories are planned in kinematic coordinates. However, this experiment could only demonstrate that adaptation to altered paths, i.e. the position of the end-effector, did occur, but not that the precise timing of end-effector position was equally planned, i.e., the trajectory. Our current experiment aims at filling this gap by explicitly testing whether position over time, i.e. velocity, is a property of reaching movements that is planned in kinematic coordinates. In the current experiment, the velocity profiles of cursor movements corresponding to the participant's hand motions were skewed either to the left or to the right; the path itself was left unaltered. We developed an adaptation paradigm, where the skew of the velocity profile was introduced gradually and participants reported no awareness of any manipulation. Preliminary results indicate that the true hand motion of participants did not alter, i.e. there was no adaptation so as to counterbalance the introduced skew. However, for some participants, peak hand velocities were lowered for higher skews, which suggests that participants interpreted the manipulation as mere noise due to variance in their own movement. In summary, for a visuomotor transformation task, the hypothesis of a planned continuous end-effector trajectory predicts adaptation to a modified velocity profile. The current experiment found no systematic adaptation under such transformation, but did demonstrate an effect that is more in accordance that subjects could not perceive the manipulation and rather interpreted as an increase of noise.

[BibTex]

[BibTex]


no image
Optimality in Neuromuscular Systems

Theodorou, E. A., Valero-Cuevas, F.

In 32nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2010, clmc (inproceedings)

Abstract
Abstract? We provide an overview of optimal control meth- ods to nonlinear neuromuscular systems and discuss their lim- itations. Moreover we extend current optimal control methods to their application to neuromuscular models with realistically numerous musculotendons; as most prior work is limited to torque-driven systems. Recent work on computational motor control has explored the used of control theory and esti- mation as a conceptual tool to understand the underlying computational principles of neuromuscular systems. After all, successful biological systems regularly meet conditions for stability, robustness and performance for multiple classes of complex tasks. Among a variety of proposed control theory frameworks to explain this, stochastic optimal control has become a dominant framework to the point of being a standard computational technique to reproduce kinematic trajectories of reaching movements (see [12]) In particular, we demonstrate the application of optimal control to a neuromuscular model of the index finger with all seven musculotendons producing a tapping task. Our simu- lations include 1) a muscle model that includes force- length and force-velocity characteristics; 2) an anatomically plausible biomechanical model of the index finger that includes a tendi- nous network for the extensor mechanism and 3) a contact model that is based on a nonlinear spring-damper attached at the end effector of the index finger. We demonstrate that it is feasible to apply optimal control to systems with realistically large state vectors and conclude that, while optimal control is an adequate formalism to create computational models of neuro- musculoskeletal systems, there remain important challenges and limitations that need to be considered and overcome such as contact transitions, curse of dimensionality, and constraints on states and controls.

PDF [BibTex]

PDF [BibTex]


no image
Learning Policy Improvements with Path Integrals

Theodorou, E. A., Buchli, J., Schaal, S.

In International Conference on Artificial Intelligence and Statistics (AISTATS 2010), 2010, clmc (inproceedings)

Abstract
With the goal to generate more scalable algo- rithms with higher efficiency and fewer open parameters, reinforcement learning (RL) has recently moved towards combining classi- cal techniques from optimal control and dy- namic programming with modern learning techniques from statistical estimation the- ory. In this vein, this paper suggests the framework of stochastic optimal control with path integrals to derive a novel approach to RL with parametrized policies. While solidly grounded in value function estimation and optimal control based on the stochastic Hamilton-Jacobi-Bellman (HJB) equations, policy improvements can be transformed into an approximation problem of a path inte- gral which has no open parameters other than the exploration noise. The resulting algorithm can be conceived of as model- based, semi-model-based, or even model free, depending on how the learning problem is structured. Our new algorithm demon- strates interesting similarities with previous RL research in the framework of proba- bility matching and provides intuition why the slightly heuristically motivated proba- bility matching approach can actually per- form well. Empirical evaluations demon- strate significant performance improvements over gradient-based policy learning and scal- ability to high-dimensional control problems. We believe that Policy Improvement with Path Integrals (PI2) offers currently one of the most efficient, numerically robust, and easy to implement algorithms for RL based on trajectory roll-outs.

PDF [BibTex]

PDF [BibTex]


no image
Learning optimal control solutions: a path integral approach

Theodorou, E., Schaal, S.

In Abstracts of Neural Control of Movement Conference (NCM 2010), Naples, Florida, 2010, 2010, clmc (inproceedings)

Abstract
Investigating principles of human motor control in the framework of optimal control has had a long tradition in neural control of movement, and has recently experienced a new surge of investigations. Ideally, optimal control problems are addresses as a reinforcement learning (RL) problem, which would allow to investigate both the process of acquiring an optimal control solution as well as the solution itself. Unfortunately, the applicability of RL to complex neural and biomechanics systems has been largely impossible so far due to the computational difficulties that arise in high dimensional continuous state-action spaces. As a way out, research has focussed on computing optimal control solutions based on iterative optimal control methods that are based on linear and quadratic approximations of dynamical models and cost functions. These methods require perfect knowledge of the dynamics and cost functions while they are based on gradient and Newton optimization schemes. Their applicability is also restricted to low dimensional problems due to problematic convergence in high dimensions. Moreover, the process of computing the optimal solution is removed from the learning process that might be plausible in biology. In this work, we present a new reinforcement learning method for learning optimal control solutions or motor control. This method, based on the framework of stochastic optimal control with path integrals, has a very solid theoretical foundation, while resulting in surprisingly simple learning algorithms. It is also possible to apply this approach without knowledge of the system model, and to use a wide variety of complex nonlinear cost functions for optimization. We illustrate the theoretical properties of this approach and its applicability to learning motor control tasks for reaching movements and locomotion studies. We discuss its applicability to learning desired trajectories, variable stiffness control (co-contraction), and parameterized control policies. We also investigate the applicability to signal dependent noise control systems. We believe that the suggested method offers one of the easiest to use approaches to learning optimal control suggested in the literature so far, which makes it ideally suited for computational investigations of biological motor control.

[BibTex]

[BibTex]


no image
Constrained Accelerations for Controlled Geometric Reduction: Sagittal-Plane Decoupling for Bipedal Locomotion

Gregg, R., Righetti, L., Buchli, J., Schaal, S.

In 2010 10th IEEE-RAS International Conference on Humanoid Robots, pages: 1-7, IEEE, Nashville, USA, 2010 (inproceedings)

Abstract
Energy-shaping control methods have produced strong theoretical results for asymptotically stable 3D bipedal dynamic walking in the literature. In particular, geometric controlled reduction exploits robot symmetries to control momentum conservation laws that decouple the sagittal-plane dynamics, which are easier to stabilize. However, the associated control laws require high-dimensional matrix inverses multiplied with complicated energy-shaping terms, often making these control theories difficult to apply to highly-redundant humanoid robots. This paper presents a first step towards the application of energy-shaping methods on real robots by casting controlled reduction into a framework of constrained accelerations for inverse dynamics control. By representing momentum conservation laws as constraints in acceleration space, we construct a general expression for desired joint accelerations that render the constraint surface invariant. By appropriately choosing an orthogonal projection, we show that the unconstrained (reduced) dynamics are decoupled from the constrained dynamics. Any acceleration-based controller can then be used to stabilize this planar subsystem, including passivity-based methods. The resulting control law is surprisingly simple and represents a practical way to employ control theoretic stability results in robotic platforms. Simulated walking of a 3D compass-gait biped show correspondence between the new and original controllers, and simulated motions of a 16-DOF humanoid demonstrate the applicability of this method.

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Variable impedance control - a reinforcement learning approach

Buchli, J., Theodorou, E., Stulp, F., Schaal, S.

In Robotics Science and Systems (2010), Zaragoza, Spain, June 27-30, 2010, clmc (inproceedings)

Abstract
One of the hallmarks of the performance, versatility, and robustness of biological motor control is the ability to adapt the impedance of the overall biomechanical system to different task requirements and stochastic disturbances. A transfer of this principle to robotics is desirable, for instance to enable robots to work robustly and safely in everyday human environments. It is, however, not trivial to derive variable impedance controllers for practical high DOF robotic tasks. In this contribution, we accomplish such gain scheduling with a reinforcement learning approach algorithm, PI2 (Policy Improvement with Path Integrals). PI2 is a model-free, sampling based learning method derived from first principles of optimal control. The PI2 algorithm requires no tuning of algorithmic parameters besides the exploration noise. The designer can thus fully focus on cost function design to specify the task. From the viewpoint of robotics, a particular useful property of PI2 is that it can scale to problems of many DOFs, so that RL on real robotic systems becomes feasible. We sketch the PI2 algorithm and its theoretical properties, and how it is applied to gain scheduling. We evaluate our approach by presenting results on two different simulated robotic systems, a 3-DOF Phantom Premium Robot and a 6-DOF Kuka Lightweight Robot. We investigate tasks where the optimal strategy requires both tuning of the impedance of the end-effector, and tuning of a reference trajectory. The results show that we can use path integral based RL not only for planning but also to derive variable gain feedback controllers in realistic scenarios. Thus, the power of variable impedance control is made available to a wide variety of robotic systems and practical applications.

link (url) [BibTex]

link (url) [BibTex]


no image
Inverse dynamics with optimal distribution of ground reaction forces for legged robot

Righetti, L., Buchli, J., Mistry, M., Schaal, S.

In Proceedings of the 13th International Conference on Climbing and Walking Robots (CLAWAR), pages: 580-587, Nagoya, Japan, sep 2010 (inproceedings)

Abstract
Contact interaction with the environment is crucial in the design of locomotion controllers for legged robots, to prevent slipping for example. Therefore, it is of great importance to be able to control the effects of the robots movements on the contact reaction forces. In this contribution, we extend a recent inverse dynamics algorithm for floating base robots to optimize the distribution of contact forces while achieving precise trajectory tracking. The resulting controller is algorithmically simple as compared to other approaches. Numerical simulations show that this result significantly increases the range of possible movements of a humanoid robot as compared to the previous inverse dynamics algorithm. We also present a simplification of the result where no inversion of the inertia matrix is needed which is particularly relevant for practical use on a real robot. Such an algorithm becomes interesting for agile locomotion of robots on difficult terrains where the contacts with the environment are critical, such as walking over rough or slippery terrain.

DOI [BibTex]

DOI [BibTex]

2004


no image
Learning Movement Primitives

Schaal, S., Peters, J., Nakanishi, J., Ijspeert, A.

In 11th International Symposium on Robotics Research (ISRR2003), pages: 561-572, (Editors: Dario, P. and Chatila, R.), Springer, ISRR, 2004, clmc (inproceedings)

Abstract
This paper discusses a comprehensive framework for modular motor control based on a recently developed theory of dynamic movement primitives (DMP). DMPs are a formulation of movement primitives with autonomous nonlinear differential equations, whose time evolution creates smooth kinematic control policies. Model-based control theory is used to convert the outputs of these policies into motor commands. By means of coupling terms, on-line modifications can be incorporated into the time evolution of the differential equations, thus providing a rather flexible and reactive framework for motor planning and execution. The linear parameterization of DMPs lends itself naturally to supervised learning from demonstration. Moreover, the temporal, scale, and translation invariance of the differential equations with respect to these parameters provides a useful means for movement recognition. A novel reinforcement learning technique based on natural stochastic policy gradients allows a general approach of improving DMPs by trial and error learning with respect to almost arbitrary optimization criteria. We demonstrate the different ingredients of the DMP approach in various examples, involving skill learning from demonstration on the humanoid robot DB, and learning biped walking from demonstration in simulation, including self-improvement of the movement patterns towards energy efficiency through resonance tuning.

link (url) DOI [BibTex]

2004

link (url) DOI [BibTex]


no image
Learning Composite Adaptive Control for a Class of Nonlinear Systems

Nakanishi, J., Farrell, J. A., Schaal, S.

In IEEE International Conference on Robotics and Automation, pages: 2647-2652, New Orleans, LA, USA, April 2004, 2004, clmc (inproceedings)

link (url) [BibTex]

link (url) [BibTex]


no image
Towards Tractable Parameter-Free Statistical Learning (Phd Thesis)

D’Souza, A

Department of Computer Science, University of Southern California, Los Angeles, 2004, clmc (phdthesis)

link (url) [BibTex]

link (url) [BibTex]


no image
A framework for learning biped locomotion with dynamic movement primitives

Nakanishi, J., Morimoto, J., Endo, G., Cheng, G., Schaal, S., Kawato, M.

In IEEE-RAS/RSJ International Conference on Humanoid Robots (Humanoids 2004), IEEE, Los Angeles, CA: Nov.10-12, Santa Monica, CA, 2004, clmc (inproceedings)

Abstract
This article summarizes our framework for learning biped locomotion using dynamical movement primitives based on nonlinear oscillators. Our ultimate goal is to establish a design principle of a controller in order to achieve natural human-like locomotion. We suggest dynamical movement primitives as a central pattern generator (CPG) of a biped robot, an approach we have previously proposed for learning and encoding complex human movements. Demonstrated trajectories are learned through movement primitives by locally weighted regression, and the frequency of the learned trajectories is adjusted automatically by a frequency adaptation algorithm based on phase resetting and entrainment of coupled oscillators. Numerical simulations and experimental implementation on a physical robot demonstrate the effectiveness of the proposed locomotion controller. Furthermore, we demonstrate that phase resetting contributes to robustness against external perturbations and environmental changes by numerical simulations and experiments.

link (url) [BibTex]

link (url) [BibTex]


no image
Learning Motor Primitives with Reinforcement Learning

Peters, J., Schaal, S.

In Proceedings of the 11th Joint Symposium on Neural Computation, http://resolver.caltech.edu/CaltechJSNC:2004.poster020, 2004, clmc (inproceedings)

Abstract
One of the major challenges in action generation for robotics and in the understanding of human motor control is to learn the "building blocks of move- ment generation," or more precisely, motor primitives. Recently, Ijspeert et al. [1, 2] suggested a novel framework how to use nonlinear dynamical systems as motor primitives. While a lot of progress has been made in teaching these mo- tor primitives using supervised or imitation learning, the self-improvement by interaction of the system with the environment remains a challenging problem. In this poster, we evaluate different reinforcement learning approaches can be used in order to improve the performance of motor primitives. For pursuing this goal, we highlight the difficulties with current reinforcement learning methods, and line out how these lead to a novel algorithm which is based on natural policy gradients [3]. We compare this algorithm to previous reinforcement learning algorithms in the context of dynamic motor primitive learning, and show that it outperforms these by at least an order of magnitude. We demonstrate the efficiency of the resulting reinforcement learning method for creating complex behaviors for automous robotics. The studied behaviors will include both discrete, finite tasks such as baseball swings, as well as complex rhythmic patterns as they occur in biped locomotion

[BibTex]

[BibTex]


no image
Computational approaches to motor learning by imitation

Schaal, S., Ijspeert, A., Billard, A.

In The Neuroscience of Social Interaction, (1431):199-218, (Editors: Frith, C. D.;Wolpert, D.), Oxford University Press, Oxford, 2004, clmc (inbook)

Abstract
Movement imitation requires a complex set of mechanisms that map an observed movement of a teacher onto one's own movement apparatus. Relevant problems include movement recognition, pose estimation, pose tracking, body correspondence, coordinate transformation from external to egocentric space, matching of observed against previously learned movement, resolution of redundant degrees-of-freedom that are unconstrained by the observation, suitable movement representations for imitation, modularization of motor control, etc. All of these topics by themselves are active research problems in computational and neurobiological sciences, such that their combination into a complete imitation system remains a daunting undertaking - indeed, one could argue that we need to understand the complete perception-action loop. As a strategy to untangle the complexity of imitation, this paper will examine imitation purely from a computational point of view, i.e. we will review statistical and mathematical approaches that have been suggested for tackling parts of the imitation problem, and discuss their merits, disadvantages and underlying principles. Given the focus on action recognition of other contributions in this special issue, this paper will primarily emphasize the motor side of imitation, assuming that a perceptual system has already identified important features of a demonstrated movement and created their corresponding spatial information. Based on the formalization of motor control in terms of control policies and their associated performance criteria, useful taxonomies of imitation learning can be generated that clarify different approaches and future research directions.

link (url) [BibTex]

link (url) [BibTex]

2001


no image
Humanoid oculomotor control based on concepts of computational neuroscience

Shibata, T., Vijayakumar, S., Conradt, J., Schaal, S.

In Humanoids2001, Second IEEE-RAS International Conference on Humanoid Robots, 2001, clmc (inproceedings)

Abstract
Oculomotor control in a humanoid robot faces similar problems as biological oculomotor systems, i.e., the stabilization of gaze in face of unknown perturbations of the body, selective attention, the complexity of stereo vision and dealing with large information processing delays. In this paper, we suggest control circuits to realize three of the most basic oculomotor behaviors - the vestibulo-ocular and optokinetic reflex (VOR-OKR) for gaze stabilization, smooth pursuit for tracking moving objects, and saccades for overt visual attention. Each of these behaviors was derived from inspirations from computational neuroscience, which proves to be a viable strategy to explore novel control mechanisms for humanoid robotics. Our implementations on a humanoid robot demonstrate good performance of the oculomotor behaviors that appears natural and human-like.

link (url) [BibTex]

2001

link (url) [BibTex]


no image
Trajectory formation for imitation with nonlinear dynamical systems

Ijspeert, A., Nakanishi, J., Schaal, S.

In IEEE International Conference on Intelligent Robots and Systems (IROS 2001), pages: 752-757, Weilea, Hawaii, Oct.29-Nov.3, 2001, clmc (inproceedings)

Abstract
This article explores a new approach to learning by imitation and trajectory formation by representing movements as mixtures of nonlinear differential equations with well-defined attractor dynamics. An observed movement is approximated by finding a best fit of the mixture model to its data by a recursive least squares regression technique. In contrast to non-autonomous movement representations like splines, the resultant movement plan remains an autonomous set of nonlinear differential equations that forms a control policy which is robust to strong external perturbations and that can be modified by additional perceptual variables. This movement policy remains the same for a given target, regardless of the initial conditions, and can easily be re-used for new targets. We evaluate the trajectory formation system (TFS) in the context of a humanoid robot simulation that is part of the Virtual Trainer (VT) project, which aims at supervising rehabilitation exercises in stroke-patients. A typical rehabilitation exercise was collected with a Sarcos Sensuit, a device to record joint angular movement from human subjects, and approximated and reproduced with our imitation techniques. Our results demonstrate that multi-joint human movements can be encoded successfully, and that this system allows robust modifications of the movement policy through external variables.

link (url) [BibTex]

link (url) [BibTex]


no image
Real-time statistical learning for robotics and human augmentation

Schaal, S., Vijayakumar, S., D’Souza, A., Ijspeert, A., Nakanishi, J.

In International Symposium on Robotics Research, (Editors: Jarvis, R. A.;Zelinsky, A.), Lorne, Victoria, Austrialia Nov.9-12, 2001, clmc (inproceedings)

Abstract
Real-time modeling of complex nonlinear dynamic processes has become increasingly important in various areas of robotics and human augmentation. To address such problems, we have been developing special statistical learning methods that meet the demands of on-line learning, in particular the need for low computational complexity, rapid learning, and scalability to high-dimensional spaces. In this paper, we introduce a novel algorithm that possesses all the necessary properties by combining methods from probabilistic and nonparametric learning. We demonstrate the applicability of our methods for three different applications in humanoid robotics, i.e., the on-line learning of a full-body inverse dynamics model, an inverse kinematics model, and imitation learning. The latter application will also introduce a novel method to shape attractor landscapes of dynamical system by means of statis-tical learning.

link (url) [BibTex]

link (url) [BibTex]


no image
Robust learning of arm trajectories through human demonstration

Billard, A., Schaal, S.

In IEEE International Conference on Intelligent Robots and Systems (IROS 2001), Piscataway, NJ: IEEE, Maui, Hawaii, Oct.29-Nov.3, 2001, clmc (inproceedings)

Abstract
We present a model, composed of hierarchy of artificial neural networks, for robot learning by demonstration. The model is implemented in a dynamic simulation of a 41 degrees of freedom humanoid for reproducing 3D human motion of the arm. Results show that the model requires few information about the desired trajectory and learns on-line the relevant features of movement. It can generalize across a small set of data to produce a qualitatively good reproduction of the demonstrated trajectory. Finally, it is shown that reproduction of the trajectory after learning is robust against perturbations.

link (url) [BibTex]

link (url) [BibTex]


no image
Overt visual attention for a humanoid robot

Vijayakumar, S., Conradt, J., Shibata, T., Schaal, S.

In IEEE International Conference on Intelligent Robots and Systems (IROS 2001), 2001, clmc (inproceedings)

Abstract
The goal of our research is to investigate the interplay between oculomotor control, visual processing, and limb control in humans and primates by exploring the computational issues of these processes with a biologically inspired artificial oculomotor system on an anthropomorphic robot. In this paper, we investigate the computational mechanisms for visual attention in such a system. Stimuli in the environment excite a dynamical neural network that implements a saliency map, i.e., a winner-take-all competition between stimuli while simultenously smoothing out noise and suppressing irrelevant inputs. In real-time, this system computes new targets for the shift of gaze, executed by the head-eye system of the robot. The redundant degrees-of- freedom of the head-eye system are resolved through a learned inverse kinematics with optimization criterion. We also address important issues how to ensure that the coordinate system of the saliency map remains correct after movement of the robot. The presented attention system is built on principled modules and generally applicable for any sensory modality.

link (url) [BibTex]

link (url) [BibTex]


no image
Learning inverse kinematics

D’Souza, A., Vijayakumar, S., Schaal, S.

In IEEE International Conference on Intelligent Robots and Systems (IROS 2001), Piscataway, NJ: IEEE, Maui, Hawaii, Oct.29-Nov.3, 2001, clmc (inproceedings)

Abstract
Real-time control of the endeffector of a humanoid robot in external coordinates requires computationally efficient solutions of the inverse kinematics problem. In this context, this paper investigates learning of inverse kinematics for resolved motion rate control (RMRC) employing an optimization criterion to resolve kinematic redundancies. Our learning approach is based on the key observations that learning an inverse of a non uniquely invertible function can be accomplished by augmenting the input representation to the inverse model and by using a spatially localized learning approach. We apply this strategy to inverse kinematics learning and demonstrate how a recently developed statistical learning algorithm, Locally Weighted Projection Regression, allows efficient learning of inverse kinematic mappings in an incremental fashion even when input spaces become rather high dimensional. The resulting performance of the inverse kinematics is comparable to Liegeois ([1]) analytical pseudo inverse with optimization. Our results are illustrated with a 30 degree-of-freedom humanoid robot.

link (url) [BibTex]

link (url) [BibTex]


no image
Biomimetic smooth pursuit based on fast learning of the target dynamics

Shibata, T., Schaal, S.

In IEEE International Conference on Intelligent Robots and Systems (IROS 2001), 2001, clmc (inproceedings)

Abstract
Following a moving target with a narrow-view foveal vision system is one of the essential oculomotor behaviors of humans and humanoids. This oculomotor behavior, called ``Smooth Pursuit'', requires accurate tracking control which cannot be achieved by a simple visual negative feedback controller due to the significant delays in visual information processing. In this paper, we present a biologically inspired and control theoretically sound smooth pursuit controller consisting of two cascaded subsystems. One is an inverse model controller for the oculomotor system, and the other is a learning controller for the dynamics of the visual target. The latter controller learns how to predict the target's motion in head coordinates such that tracking performance can be improved. We investigate our smooth pursuit system in simulations and experiments on a humanoid robot. By using a fast on-line statistical learning network, our humanoid oculomotor system is able to acquire high performance smooth pursuit after about 5 seconds of learning despite significant processing delays in the syste

link (url) [BibTex]

link (url) [BibTex]

1997


no image
Learning from demonstration

Schaal, S.

In Advances in Neural Information Processing Systems 9, pages: 1040-1046, (Editors: Mozer, M. C.;Jordan, M.;Petsche, T.), MIT Press, Cambridge, MA, 1997, clmc (inproceedings)

Abstract
By now it is widely accepted that learning a task from scratch, i.e., without any prior knowledge, is a daunting undertaking. Humans, however, rarely attempt to learn from scratch. They extract initial biases as well as strategies how to approach a learning problem from instructions and/or demonstrations of other humans. For learning control, this paper investigates how learning from demonstration can be applied in the context of reinforcement learning. We consider priming the Q-function, the value function, the policy, and the model of the task dynamics as possible areas where demonstrations can speed up learning. In general nonlinear learning problems, only model-based reinforcement learning shows significant speed-up after a demonstration, while in the special case of linear quadratic regulator (LQR) problems, all methods profit from the demonstration. In an implementation of pole balancing on a complex anthropomorphic robot arm, we demonstrate that, when facing the complexities of real signal processing, model-based reinforcement learning offers the most robustness for LQR problems. Using the suggested methods, the robot learns pole balancing in just a single trial after a 30 second long demonstration of the human instructor. 

link (url) [BibTex]

1997

link (url) [BibTex]


no image
Robot learning from demonstration

Atkeson, C. G., Schaal, S.

In Machine Learning: Proceedings of the Fourteenth International Conference (ICML ’97), pages: 12-20, (Editors: Fisher Jr., D. H.), Morgan Kaufmann, Nashville, TN, July 8-12, 1997, 1997, clmc (inproceedings)

Abstract
The goal of robot learning from demonstration is to have a robot learn from watching a demonstration of the task to be performed. In our approach to learning from demonstration the robot learns a reward function from the demonstration and a task model from repeated attempts to perform the task. A policy is computed based on the learned reward function and task model. Lessons learned from an implementation on an anthropomorphic robot arm using a pendulum swing up task include 1) simply mimicking demonstrated motions is not adequate to perform this task, 2) a task planner can use a learned model and reward function to compute an appropriate policy, 3) this model-based planning process supports rapid learning, 4) both parametric and nonparametric models can be learned and used, and 5) incorporating a task level direct learning component, which is non-model-based, in addition to the model-based planner, is useful in compensating for structural modeling errors and slow model learning. 

link (url) [BibTex]

link (url) [BibTex]


no image
Local dimensionality reduction for locally weighted learning

Vijayakumar, S., Schaal, S.

In International Conference on Computational Intelligence in Robotics and Automation, pages: 220-225, Monteray, CA, July10-11, 1997, 1997, clmc (inproceedings)

Abstract
Incremental learning of sensorimotor transformations in high dimensional spaces is one of the basic prerequisites for the success of autonomous robot devices as well as biological movement systems. So far, due to sparsity of data in high dimensional spaces, learning in such settings requires a significant amount of prior knowledge about the learning task, usually provided by a human expert. In this paper we suggest a partial revision of the view. Based on empirical studies, it can been observed that, despite being globally high dimensional and sparse, data distributions from physical movement systems are locally low dimensional and dense. Under this assumption, we derive a learning algorithm, Locally Adaptive Subspace Regression, that exploits this property by combining a local dimensionality reduction as a preprocessing step with a nonparametric learning technique, locally weighted regression. The usefulness of the algorithm and the validity of its assumptions are illustrated for a synthetic data set and data of the inverse dynamics of an actual 7 degree-of-freedom anthropomorphic robot arm.

link (url) [BibTex]

link (url) [BibTex]


no image
Learning tasks from a single demonstration

Atkeson, C. G., Schaal, S.

In IEEE International Conference on Robotics and Automation (ICRA97), 2, pages: 1706-1712, Piscataway, NJ: IEEE, Albuquerque, NM, 20-25 April, 1997, clmc (inproceedings)

Abstract
Learning a complex dynamic robot manoeuvre from a single human demonstration is difficult. This paper explores an approach to learning from demonstration based on learning an optimization criterion from the demonstration and a task model from repeated attempts to perform the task, and using the learned criterion and model to compute an appropriate robot movement. A preliminary version of the approach has been implemented on an anthropomorphic robot arm using a pendulum swing up task as an example

link (url) [BibTex]

link (url) [BibTex]

1996


no image
A kendama learning robot based on a dynamic optimiation principle

Miyamoto, H., Gandolfo, F., Gomi, H., Schaal, S., Koike, Y., Rieka, O., Nakano, E., Wada, Y., Kawato, M.

In Preceedings of the International Conference on Neural Information Processing, pages: 938-942, Hong Kong, September 1996, clmc (inproceedings)

[BibTex]

1996

[BibTex]


no image
From isolation to cooperation: An alternative of a system of experts

Schaal, S., Atkeson, C. G.

In Advances in Neural Information Processing Systems 8, pages: 605-611, (Editors: Touretzky, D. S.;Mozer, M. C.;Hasselmo, M. E.), MIT Press, Cambridge, MA, 1996, clmc (inbook)

Abstract
We introduce a constructive, incremental learning system for regression problems that models data by means of locally linear experts. In contrast to other approaches, the experts are trained independently and do not compete for data during learning. Only when a prediction for a query is required do the experts cooperate by blending their individual predictions. Each expert is trained by minimizing a penalized local cross validation error using second order methods. In this way, an expert is able to adjust the size and shape of the receptive field in which its predictions are valid, and also to adjust its bias on the importance of individual input dimensions. The size and shape adjustment corresponds to finding a local distance metric, while the bias adjustment accomplishes local dimensionality reduction. We derive asymptotic results for our method. In a variety of simulations we demonstrate the properties of the algorithm with respect to interference, learning speed, prediction accuracy, feature detection, and task oriented incremental learning. 

link (url) [BibTex]

link (url) [BibTex]

1991


no image
Ways to smarter CAD-systems

Ehrlenspiel, K., Schaal, S.

In Proceedings of ICED’91Heurista, pages: 10-16, (Editors: Hubka), Edition, Schriftenreihe WDK 21. Zürich, 1991, clmc (inbook)

[BibTex]

1991

[BibTex]