Header logo is am


2018


Thumb xl screen shot 2018 04 19 at 14.57.08
Motion-based Object Segmentation based on Dense RGB-D Scene Flow

Shao, L., Shah, P., Dwaracherla, V., Bohg, J.

IEEE Robotics and Automation Letters, 3(4):3797-3804, IEEE, IEEE/RSJ International Conference on Intelligent Robots and Systems, October 2018 (conference)

Abstract
Given two consecutive RGB-D images, we propose a model that estimates a dense 3D motion field, also known as scene flow. We take advantage of the fact that in robot manipulation scenarios, scenes often consist of a set of rigidly moving objects. Our model jointly estimates (i) the segmentation of the scene into an unknown but finite number of objects, (ii) the motion trajectories of these objects and (iii) the object scene flow. We employ an hourglass, deep neural network architecture. In the encoding stage, the RGB and depth images undergo spatial compression and correlation. In the decoding stage, the model outputs three images containing a per-pixel estimate of the corresponding object center as well as object translation and rotation. This forms the basis for inferring the object segmentation and final object scene flow. To evaluate our model, we generated a new and challenging, large-scale, synthetic dataset that is specifically targeted at robotic manipulation: It contains a large number of scenes with a very diverse set of simultaneously moving 3D objects and is recorded with a commonly-used RGB-D camera. In quantitative experiments, we show that we significantly outperform state-of-the-art scene flow and motion-segmentation methods. In qualitative experiments, we show how our learned model transfers to challenging real-world scenes, visually generating significantly better results than existing methods.

Project Page arXiv DOI [BibTex]

2018

Project Page arXiv DOI [BibTex]


Thumb xl teaser image
Probabilistic Recurrent State-Space Models

Doerr, A., Daniel, C., Schiegg, M., Nguyen-Tuong, D., Schaal, S., Toussaint, M., Trimpe, S.

In Proceedings of the International Conference on Machine Learning (ICML), International Conference on Machine Learning (ICML), July 2018 (inproceedings)

Abstract
State-space models (SSMs) are a highly expressive model class for learning patterns in time series data and for system identification. Deterministic versions of SSMs (e.g., LSTMs) proved extremely successful in modeling complex time-series data. Fully probabilistic SSMs, however, unfortunately often prove hard to train, even for smaller problems. To overcome this limitation, we propose a scalable initialization and training algorithm based on doubly stochastic variational inference and Gaussian processes. In the variational approximation we propose in contrast to related approaches to fully capture the latent state temporal correlations to allow for robust training.

arXiv pdf Project Page [BibTex]

arXiv pdf Project Page [BibTex]


Thumb xl meta learning overview
Online Learning of a Memory for Learning Rates

(nominated for best paper award)

Meier, F., Kappler, D., Schaal, S.

In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) 2018, IEEE, International Conference on Robotics and Automation, May 2018, accepted (inproceedings)

Abstract
The promise of learning to learn for robotics rests on the hope that by extracting some information about the learning process itself we can speed up subsequent similar learning tasks. Here, we introduce a computationally efficient online meta-learning algorithm that builds and optimizes a memory model of the optimal learning rate landscape from previously observed gradient behaviors. While performing task specific optimization, this memory of learning rates predicts how to scale currently observed gradients. After applying the gradient scaling our meta-learner updates its internal memory based on the observed effect its prediction had. Our meta-learner can be combined with any gradient-based optimizer, learns on the fly and can be transferred to new optimization tasks. In our evaluations we show that our meta-learning algorithm speeds up learning of MNIST classification and a variety of learning control tasks, either in batch or online learning settings.

pdf video code [BibTex]

pdf video code [BibTex]


Thumb xl learning ct w asm block diagram detailed
Learning Sensor Feedback Models from Demonstrations via Phase-Modulated Neural Networks

Sutanto, G., Su, Z., Schaal, S., Meier, F.

In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) 2018, IEEE, International Conference on Robotics and Automation, May 2018 (inproceedings)

pdf video [BibTex]

pdf video [BibTex]


no image
On Time Optimization of Centroidal Momentum Dynamics

Ponton, B., Herzog, A., Del Prete, A., Schaal, S., Righetti, L.

In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages: 5776-5782, IEEE, Brisbane, Australia, 2018 (inproceedings)

Abstract
Recently, the centroidal momentum dynamics has received substantial attention to plan dynamically consistent motions for robots with arms and legs in multi-contact scenarios. However, it is also non convex which renders any optimization approach difficult and timing is usually kept fixed in most trajectory optimization techniques to not introduce additional non convexities to the problem. But this can limit the versatility of the algorithms. In our previous work, we proposed a convex relaxation of the problem that allowed to efficiently compute momentum trajectories and contact forces. However, our approach could not minimize a desired angular momentum objective which seriously limited its applicability. Noticing that the non-convexity introduced by the time variables is of similar nature as the centroidal dynamics one, we propose two convex relaxations to the problem based on trust regions and soft constraints. The resulting approaches can compute time-optimized dynamically consistent trajectories sufficiently fast to make the approach realtime capable. The performance of the algorithm is demonstrated in several multi-contact scenarios for a humanoid robot. In particular, we show that the proposed convex relaxation of the original problem finds solutions that are consistent with the original non-convex problem and illustrate how timing optimization allows to find motion plans that would be difficult to plan with fixed timing † †Implementation details and demos can be found in the source code available at https://git-amd.tuebingen.mpg.de/bponton/timeoptimization.

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Unsupervised Contact Learning for Humanoid Estimation and Control

Rotella, N., Schaal, S., Righetti, L.

In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages: 411-417, IEEE, Brisbane, Australia, 2018 (inproceedings)

Abstract
This work presents a method for contact state estimation using fuzzy clustering to learn contact probability for full, six-dimensional humanoid contacts. The data required for training is solely from proprioceptive sensors - endeffector contact wrench sensors and inertial measurement units (IMUs) - and the method is completely unsupervised. The resulting cluster means are used to efficiently compute the probability of contact in each of the six endeffector degrees of freedom (DoFs) independently. This clustering-based contact probability estimator is validated in a kinematics-based base state estimator in a simulation environment with realistic added sensor noise for locomotion over rough, low-friction terrain on which the robot is subject to foot slip and rotation. The proposed base state estimator which utilizes these six DoF contact probability estimates is shown to perform considerably better than that which determines kinematic contact constraints purely based on measured normal force.

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Learning Task-Specific Dynamics to Improve Whole-Body Control

Gams, A., Mason, S., Ude, A., Schaal, S., Righetti, L.

In Hua, IEEE, Beijing, China, November 2018 (inproceedings)

Abstract
In task-based inverse dynamics control, reference accelerations used to follow a desired plan can be broken down into feedforward and feedback trajectories. The feedback term accounts for tracking errors that are caused from inaccurate dynamic models or external disturbances. On underactuated, free-floating robots, such as humanoids, high feedback terms can be used to improve tracking accuracy; however, this can lead to very stiff behavior or poor tracking accuracy due to limited control bandwidth. In this paper, we show how to reduce the required contribution of the feedback controller by incorporating learned task-space reference accelerations. Thus, we i) improve the execution of the given specific task, and ii) offer the means to reduce feedback gains, providing for greater compliance of the system. With a systematic approach we also reduce heuristic tuning of the model parameters and feedback gains, often present in real-world experiments. In contrast to learning task-specific joint-torques, which might produce a similar effect but can lead to poor generalization, our approach directly learns the task-space dynamics of the center of mass of a humanoid robot. Simulated and real-world results on the lower part of the Sarcos Hermes humanoid robot demonstrate the applicability of the approach.

link (url) [BibTex]

link (url) [BibTex]


no image
An MPC Walking Framework With External Contact Forces

Mason, S., Rotella, N., Schaal, S., Righetti, L.

In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages: 1785-1790, IEEE, Brisbane, Australia, May 2018 (inproceedings)

Abstract
In this work, we present an extension to a linear Model Predictive Control (MPC) scheme that plans external contact forces for the robot when given multiple contact locations and their corresponding friction cone. To this end, we set up a two-step optimization problem. In the first optimization, we compute the Center of Mass (CoM) trajectory, foot step locations, and introduce slack variables to account for violating the imposed constraints on the Zero Moment Point (ZMP). We then use the slack variables to trigger the second optimization, in which we calculate the optimal external force that compensates for the ZMP tracking error. This optimization considers multiple contacts positions within the environment by formulating the problem as a Mixed Integer Quadratic Program (MIQP) that can be solved at a speed between 100-300 Hz. Once contact is created, the MIQP reduces to a single Quadratic Program (QP) that can be solved in real-time ({\textless}; 1kHz). Simulations show that the presented walking control scheme can withstand disturbances 2-3× larger with the additional force provided by a hand contact.

link (url) DOI [BibTex]

link (url) DOI [BibTex]

2014


Thumb xl screen shot 2014 07 09 at 15.49.27
Robot Arm Pose Estimation through Pixel-Wise Part Classification

Bohg, J., Romero, J., Herzog, A., Schaal, S.

In IEEE International Conference on Robotics and Automation (ICRA) 2014, pages: 3143-3150, IEEE International Conference on Robotics and Automation (ICRA), June 2014 (inproceedings)

Abstract
We propose to frame the problem of marker-less robot arm pose estimation as a pixel-wise part classification problem. As input, we use a depth image in which each pixel is classified to be either from a particular robot part or the background. The classifier is a random decision forest trained on a large number of synthetically generated and labeled depth images. From all the training samples ending up at a leaf node, a set of offsets is learned that votes for relative joint positions. Pooling these votes over all foreground pixels and subsequent clustering gives us an estimate of the true joint positions. Due to the intrinsic parallelism of pixel-wise classification, this approach can run in super real-time and is more efficient than previous ICP-like methods. We quantitatively evaluate the accuracy of this approach on synthetic data. We also demonstrate that the method produces accurate joint estimates on real data despite being purely trained on synthetic data.

video code pdf DOI Project Page [BibTex]

2014

video code pdf DOI Project Page [BibTex]


no image
A Self-Tuning LQR Approach Demonstrated on an Inverted Pendulum

Trimpe, S., Millane, A., Doessegger, S., D’Andrea, R.

In Proceedings of the 19th IFAC World Congress, Cape Town, South Africa, 2014 (inproceedings)

PDF Supplementary material DOI [BibTex]

PDF Supplementary material DOI [BibTex]


no image
Learning coupling terms for obstacle avoidance

Rai, A., Meier, F., Ijspeert, A., Schaal, S.

In International Conference on Humanoid Robotics, pages: 512-518, IEEE, 2014, clmc (inproceedings)

Abstract
Autonomous manipulation in dynamic environments is important for robots to perform everyday tasks. For this, a manipulator should be capable of interpreting the environment and planning an appropriate movement. At least, two possible approaches exist for this in literature. Usually, a planning system is used to generate a complex movement plan that satisfies all constraints. Alternatively, a simple plan could be chosen and modified with sensory feedback to accommodate additional constraints by equipping the controller with features that remain dormant most of the time, except when specific situations arise. Dynamic Movement Primitives (DMPs) form a robust and versatile starting point for such a controller that can be modified online using a non-linear term, called the coupling term. This can prove to be a fast and reactive way of obstacle avoidance in a human-like fashion. We propose a method to learn this coupling term from human demonstrations starting with simple features and making it more robust to avoid a larger range of obstacles. We test the ability of our coupling term to model different kinds of obstacle avoidance behaviours in humans and use this learnt coupling term to avoid obstacles in a reactive manner. This line of research aims at pushing the boundary of reactive control strategies to more complex scenarios, such that complex and usually computationally more expensive planning methods can be avoided as much as possible.

link (url) Project Page [BibTex]

link (url) Project Page [BibTex]


no image
Incremental Local Gaussian Regression

Meier, F., Hennig, P., Schaal, S.

In Advances in Neural Information Processing Systems 27, pages: 972-980, (Editors: Z. Ghahramani, M. Welling, C. Cortes, N.D. Lawrence and K.Q. Weinberger), 28th Annual Conference on Neural Information Processing Systems (NIPS), 2014, clmc (inproceedings)

PDF link (url) [BibTex]

PDF link (url) [BibTex]


Thumb xl gene
Generalization of the tacit learning controller based on periodic tuning functions

Berenz, V., Hayashibe, M., Alnajjar, F., Shimoda, S.

In 5th IEEE RAS/EMBS International Conference on Biomedical Robotics and Biomechatronics, pages: 893-898, 2014 (inproceedings)

DOI [BibTex]

DOI [BibTex]


no image
Efficient Bayesian Local Model Learning for Control

Meier, F., Hennig, P., Schaal, S.

In Proceedings of the IEEE International Conference on Intelligent Robots and Systems, pages: 2244 - 2249, IROS, 2014, clmc (inproceedings)

Abstract
Model-based control is essential for compliant controland force control in many modern complex robots, like humanoidor disaster robots. Due to many unknown and hard tomodel nonlinearities, analytical models of such robots are oftenonly very rough approximations. However, modern optimizationcontrollers frequently depend on reasonably accurate models,and degrade greatly in robustness and performance if modelerrors are too large. For a long time, machine learning hasbeen expected to provide automatic empirical model synthesis,yet so far, research has only generated feasibility studies butno learning algorithms that run reliably on complex robots.In this paper, we combine two promising worlds of regressiontechniques to generate a more powerful regression learningsystem. On the one hand, locally weighted regression techniquesare computationally efficient, but hard to tune due to avariety of data dependent meta-parameters. On the other hand,Bayesian regression has rather automatic and robust methods toset learning parameters, but becomes quickly computationallyinfeasible for big and high-dimensional data sets. By reducingthe complexity of Bayesian regression in the spirit of local modellearning through variational approximations, we arrive at anovel algorithm that is computationally efficient and easy toinitialize for robust learning. Evaluations on several datasetsdemonstrate very good learning performance and the potentialfor a general regression learning tool for robotics.

PDF link (url) DOI [BibTex]

PDF link (url) DOI [BibTex]


no image
Stability Analysis of Distributed Event-Based State Estimation

Trimpe, S.

In Proceedings of the 53rd IEEE Conference on Decision and Control, Los Angeles, CA, 2014 (inproceedings)

Abstract
An approach for distributed and event-based state estimation that was proposed in previous work [1] is analyzed and extended to practical networked systems in this paper. Multiple sensor-actuator-agents observe a dynamic process, sporadically exchange their measurements over a broadcast network according to an event-based protocol, and estimate the process state from the received data. The event-based approach was shown in [1] to mimic a centralized Luenberger observer up to guaranteed bounds, under the assumption of identical estimates on all agents. This assumption, however, is unrealistic (it is violated by a single packet drop or slight numerical inaccuracy) and removed herein. By means of a simulation example, it is shown that non-identical estimates can actually destabilize the overall system. To achieve stability, the event-based communication scheme is supplemented by periodic (but infrequent) exchange of the agentsâ?? estimates and reset to their joint average. When the local estimates are used for feedback control, the stability guarantee for the estimation problem extends to the event-based control system.

PDF Supplementary material DOI Project Page [BibTex]

PDF Supplementary material DOI Project Page [BibTex]


no image
Dual Execution of Optimized Contact Interaction Trajectories

Toussaint, M., Ratliff, N., Bohg, J., Righetti, L., Englert, P., Schaal, S.

In 2014 IEEE/RSJ Conference on Intelligent Robots and Systems, pages: 47-54, IEEE, Chicago, USA, 2014 (inproceedings)

Abstract
Efficient manipulation requires contact to reduce uncertainty. The manipulation literature refers to this as funneling: a methodology for increasing reliability and robustness by leveraging haptic feedback and control of environmental interaction. However, there is a fundamental gap between traditional approaches to trajectory optimization and this concept of robustness by funneling: traditional trajectory optimizers do not discover force feedback strategies. From a POMDP perspective, these behaviors could be regarded as explicit observation actions planned to sufficiently reduce uncertainty thereby enabling a task. While we are sympathetic to the full POMDP view, solving full continuous-space POMDPs in high-dimensions is hard. In this paper, we propose an alternative approach in which trajectory optimization objectives are augmented with new terms that reward uncertainty reduction through contacts, explicitly promoting funneling. This augmentation shifts the responsibility of robustness toward the actual execution of the optimized trajectories. Directly tracing trajectories through configuration space would lose all robustness-dual execution achieves robustness by devising force controllers to reproduce the temporal interaction profile encoded in the dual solution of the optimization problem. This work introduces dual execution in depth and analyze its performance through robustness experiments in both simulation and on a real-world robotic platform.

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Learning and Exploration in a Novel Dimensionality-Reduction Task

Ebert, J, Kim, S, Schweighofer, N., Sternad, D, Schaal, S.

In Abstracts of Neural Control of Movement Conference (NCM 2009), Amsterdam, Netherlands, 2014 (inproceedings)

[BibTex]

[BibTex]


no image
Balancing experiments on a torque-controlled humanoid with hierarchical inverse dynamics

Herzog, A., Righetti, L., Grimminger, F., Pastor, P., Schaal, S.

In 2014 IEEE/RSJ Conference on Intelligent Robots and Systems, pages: 981-988, IEEE, Chicago, USA, 2014 (inproceedings)

Abstract
Recently several hierarchical inverse dynamics controllers based on cascades of quadratic programs have been proposed for application on torque controlled robots. They have important theoretical benefits but have never been implemented on a torque controlled robot where model inaccuracies and real-time computation requirements can be problematic. In this contribution we present an experimental evaluation of these algorithms in the context of balance control for a humanoid robot. The presented experiments demonstrate the applicability of the approach under real robot conditions (i.e. model uncertainty, estimation errors, etc). We propose a simplification of the optimization problem that allows us to decrease computation time enough to implement it in a fast torque control loop. We implement a momentum-based balance controller which shows robust performance in face of unknown disturbances, even when the robot is standing on only one foot. In a second experiment, a tracking task is evaluated to demonstrate the performance of the controller with more complicated hierarchies. Our results show that hierarchical inverse dynamics controllers can be used for feedback control of humanoid robots and that momentum-based balance control can be efficiently implemented on a real robot.

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Full Dynamics LQR Control of a Humanoid Robot: An Experimental Study on Balancing and Squatting

Mason, S., Righetti, L., Schaal, S.

In 2014 IEEE-RAS International Conference on Humanoid Robots, pages: 374-379, IEEE, Madrid, Spain, 2014 (inproceedings)

Abstract
Humanoid robots operating in human environments require whole-body controllers that can offer precise tracking and well-defined disturbance rejection behavior. In this contribution, we propose an experimental evaluation of a linear quadratic regulator (LQR) using a linearization of the full robot dynamics together with the contact constraints. The advantage of the controller is that it explicitly takes into account the coupling between the different joints to create optimal feedback controllers for whole-body control. We also propose a method to explicitly regulate other tasks of interest, such as the regulation of the center of mass of the robot or its angular momentum. In order to evaluate the performance of linear optimal control designs in a real-world scenario (model uncertainty, sensor noise, imperfect state estimation, etc), we test the controllers in a variety of tracking and balancing experiments on a torque controlled humanoid (e.g. balancing, split plane balancing, squatting, pushes while squatting, and balancing on a wheeled platform). The proposed control framework shows a reliable push recovery behavior competitive with more sophisticated balance controllers, rejecting impulses up to 11.7 Ns with peak forces of 650 N, with the added advantage of great computational simplicity. Furthermore, the controller is able to track squatting trajectories up to 1 Hz without relinearization, suggesting that the linearized dynamics is sufficient for significant ranges of motion.

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
State Estimation for a Humanoid Robot

Rotella, N., Bloesch, M., Righetti, L., Schaal, S.

In 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages: 952-958, IEEE, Chicago, USA, 2014 (inproceedings)

Abstract
This paper introduces a framework for state estimation on a humanoid robot platform using only common proprioceptive sensors and knowledge of leg kinematics. The presented approach extends that detailed in prior work on a point-foot quadruped platform by adding the rotational constraints imposed by the humanoid's flat feet. As in previous work, the proposed Extended Kalman Filter accommodates contact switching and makes no assumptions about gait or terrain, making it applicable on any humanoid platform for use in any task. A nonlinear observability analysis is performed on both the point-foot and flat-foot filters and it is concluded that the addition of rotational constraints significantly simplifies singular cases and improves the observability characteristics of the system. Results on a simulated walking dataset demonstrate the performance gain of the flat-foot filter as well as confirm the results of the presented observability analysis.

link (url) DOI [BibTex]

link (url) DOI [BibTex]

2006


no image
Learning operational space control

Peters, J., Schaal, S.

In Robotics: Science and Systems II (RSS 2006), pages: 255-262, (Editors: Gaurav S. Sukhatme and Stefan Schaal and Wolfram Burgard and Dieter Fox), Cambridge, MA: MIT Press, RSS , 2006, clmc (inproceedings)

Abstract
While operational space control is of essential importance for robotics and well-understood from an analytical point of view, it can be prohibitively hard to achieve accurate control in face of modeling errors, which are inevitable in complex robots, e.g., humanoid robots. In such cases, learning control methods can offer an interesting alternative to analytical control algorithms. However, the resulting learning problem is ill-defined as it requires to learn an inverse mapping of a usually redundant system, which is well known to suffer from the property of non-covexity of the solution space, i.e., the learning system could generate motor commands that try to steer the robot into physically impossible configurations. A first important insight for this paper is that, nevertheless, a physically correct solution to the inverse problem does exits when learning of the inverse map is performed in a suitable piecewise linear way. The second crucial component for our work is based on a recent insight that many operational space controllers can be understood in terms of a constraint optimal control problem. The cost function associated with this optimal control problem allows us to formulate a learning algorithm that automatically synthesizes a globally consistent desired resolution of redundancy while learning the operational space controller. From the view of machine learning, the learning problem corresponds to a reinforcement learning problem that maximizes an immediate reward and that employs an expectation-maximization policy search algorithm. Evaluations on a three degrees of freedom robot arm illustrate the feasability of our suggested approach.

link (url) [BibTex]

2006

link (url) [BibTex]


no image
Reinforcement Learning for Parameterized Motor Primitives

Peters, J., Schaal, S.

In Proceedings of the 2006 International Joint Conference on Neural Networks, pages: 73-80, IJCNN, 2006, clmc (inproceedings)

Abstract
One of the major challenges in both action generation for robotics and in the understanding of human motor control is to learn the "building blocks of movement generation", called motor primitives. Motor primitives, as used in this paper, are parameterized control policies such as splines or nonlinear differential equations with desired attractor properties. While a lot of progress has been made in teaching parameterized motor primitives using supervised or imitation learning, the self-improvement by interaction of the system with the environment remains a challenging problem. In this paper, we evaluate different reinforcement learning approaches for improving the performance of parameterized motor primitives. For pursuing this goal, we highlight the difficulties with current reinforcement learning methods, and outline both established and novel algorithms for the gradient-based improvement of parameterized policies. We compare these algorithms in the context of motor primitive learning, and show that our most modern algorithm, the Episodic Natural Actor-Critic outperforms previous algorithms by at least an order of magnitude. We demonstrate the efficiency of this reinforcement learning method in the application of learning to hit a baseball with an anthropomorphic robot arm.

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Policy gradient methods for robotics

Peters, J., Schaal, S.

In Proceedings of the IEEE International Conference on Intelligent Robotics Systems, pages: 2219-2225, IROS, 2006, clmc (inproceedings)

Abstract
The aquisition and improvement of motor skills and control policies for robotics from trial and error is of essential importance if robots should ever leave precisely pre-structured environments. However, to date only few existing reinforcement learning methods have been scaled into the domains of highdimensional robots such as manipulator, legged or humanoid robots. Policy gradient methods remain one of the few exceptions and have found a variety of applications. Nevertheless, the application of such methods is not without peril if done in an uninformed manner. In this paper, we give an overview on learning with policy gradient methods for robotics with a strong focus on recent advances in the field. We outline previous applications to robotics and show how the most recently developed methods can significantly improve learning performance. Finally, we evaluate our most promising algorithm in the application of hitting a baseball with an anthropomorphic arm.

link (url) DOI [BibTex]

link (url) DOI [BibTex]

2004


no image
Learning Movement Primitives

Schaal, S., Peters, J., Nakanishi, J., Ijspeert, A.

In 11th International Symposium on Robotics Research (ISRR2003), pages: 561-572, (Editors: Dario, P. and Chatila, R.), Springer, ISRR, 2004, clmc (inproceedings)

Abstract
This paper discusses a comprehensive framework for modular motor control based on a recently developed theory of dynamic movement primitives (DMP). DMPs are a formulation of movement primitives with autonomous nonlinear differential equations, whose time evolution creates smooth kinematic control policies. Model-based control theory is used to convert the outputs of these policies into motor commands. By means of coupling terms, on-line modifications can be incorporated into the time evolution of the differential equations, thus providing a rather flexible and reactive framework for motor planning and execution. The linear parameterization of DMPs lends itself naturally to supervised learning from demonstration. Moreover, the temporal, scale, and translation invariance of the differential equations with respect to these parameters provides a useful means for movement recognition. A novel reinforcement learning technique based on natural stochastic policy gradients allows a general approach of improving DMPs by trial and error learning with respect to almost arbitrary optimization criteria. We demonstrate the different ingredients of the DMP approach in various examples, involving skill learning from demonstration on the humanoid robot DB, and learning biped walking from demonstration in simulation, including self-improvement of the movement patterns towards energy efficiency through resonance tuning.

link (url) DOI [BibTex]

2004

link (url) DOI [BibTex]


no image
Learning Composite Adaptive Control for a Class of Nonlinear Systems

Nakanishi, J., Farrell, J. A., Schaal, S.

In IEEE International Conference on Robotics and Automation, pages: 2647-2652, New Orleans, LA, USA, April 2004, 2004, clmc (inproceedings)

link (url) [BibTex]

link (url) [BibTex]


no image
A framework for learning biped locomotion with dynamic movement primitives

Nakanishi, J., Morimoto, J., Endo, G., Cheng, G., Schaal, S., Kawato, M.

In IEEE-RAS/RSJ International Conference on Humanoid Robots (Humanoids 2004), IEEE, Los Angeles, CA: Nov.10-12, Santa Monica, CA, 2004, clmc (inproceedings)

Abstract
This article summarizes our framework for learning biped locomotion using dynamical movement primitives based on nonlinear oscillators. Our ultimate goal is to establish a design principle of a controller in order to achieve natural human-like locomotion. We suggest dynamical movement primitives as a central pattern generator (CPG) of a biped robot, an approach we have previously proposed for learning and encoding complex human movements. Demonstrated trajectories are learned through movement primitives by locally weighted regression, and the frequency of the learned trajectories is adjusted automatically by a frequency adaptation algorithm based on phase resetting and entrainment of coupled oscillators. Numerical simulations and experimental implementation on a physical robot demonstrate the effectiveness of the proposed locomotion controller. Furthermore, we demonstrate that phase resetting contributes to robustness against external perturbations and environmental changes by numerical simulations and experiments.

link (url) [BibTex]

link (url) [BibTex]


no image
Learning Motor Primitives with Reinforcement Learning

Peters, J., Schaal, S.

In Proceedings of the 11th Joint Symposium on Neural Computation, http://resolver.caltech.edu/CaltechJSNC:2004.poster020, 2004, clmc (inproceedings)

Abstract
One of the major challenges in action generation for robotics and in the understanding of human motor control is to learn the "building blocks of move- ment generation," or more precisely, motor primitives. Recently, Ijspeert et al. [1, 2] suggested a novel framework how to use nonlinear dynamical systems as motor primitives. While a lot of progress has been made in teaching these mo- tor primitives using supervised or imitation learning, the self-improvement by interaction of the system with the environment remains a challenging problem. In this poster, we evaluate different reinforcement learning approaches can be used in order to improve the performance of motor primitives. For pursuing this goal, we highlight the difficulties with current reinforcement learning methods, and line out how these lead to a novel algorithm which is based on natural policy gradients [3]. We compare this algorithm to previous reinforcement learning algorithms in the context of dynamic motor primitive learning, and show that it outperforms these by at least an order of magnitude. We demonstrate the efficiency of the resulting reinforcement learning method for creating complex behaviors for automous robotics. The studied behaviors will include both discrete, finite tasks such as baseball swings, as well as complex rhythmic patterns as they occur in biped locomotion

[BibTex]

[BibTex]

2000


no image
Reciprocal excitation between biological and robotic research

Schaal, S., Sternad, D., Dean, W., Kotoska, S., Osu, R., Kawato, M.

In Sensor Fusion and Decentralized Control in Robotic Systems III, Proceedings of SPIE, 4196, pages: 30-40, Boston, MA, Nov.5-8, 2000, November 2000, clmc (inproceedings)

Abstract
While biological principles have inspired researchers in computational and engineering research for a long time, there is still rather limited knowledge flow back from computational to biological domains. This paper presents examples of our work where research on anthropomorphic robots lead us to new insights into explaining biological movement phenomena, starting from behavioral studies up to brain imaging studies. Our research over the past years has focused on principles of trajectory formation with nonlinear dynamical systems, on learning internal models for nonlinear control, and on advanced topics like imitation learning. The formal and empirical analyses of the kinematics and dynamics of movements systems and the tasks that they need to perform lead us to suggest principles of motor control that later on we found surprisingly related to human behavior and even brain activity.

link (url) [BibTex]

2000

link (url) [BibTex]


no image
Nonlinear dynamical systems as movement primitives

Schaal, S., Kotosaka, S., Sternad, D.

In Humanoids2000, First IEEE-RAS International Conference on Humanoid Robots, CD-Proceedings, Cambridge, MA, September 2000, clmc (inproceedings)

Abstract
This paper explores the idea to create complex human-like movements from movement primitives based on nonlinear attractor dynamics. Each degree-of-freedom of a limb is assumed to have two independent abilities to create movement, one through a discrete dynamic system, and one through a rhythmic system. The discrete system creates point-to-point movements based on internal or external target specifications. The rhythmic system can add an additional oscillatory movement relative to the current position of the discrete system. In the present study, we develop appropriate dynamic systems that can realize the above model, motivate the particular choice of the systems from a biological and engineering point of view, and present simulation results of the performance of such movement primitives. The model was implemented for a drumming task on a humanoid robot

link (url) [BibTex]

link (url) [BibTex]


no image
Real Time Learning in Humanoids: A challenge for scalability of Online Algorithms

Vijayakumar, S., Schaal, S.

In Humanoids2000, First IEEE-RAS International Conference on Humanoid Robots, CD-Proceedings, Cambridge, MA, September 2000, clmc (inproceedings)

Abstract
While recent research in neural networks and statistical learning has focused mostly on learning from finite data sets without stringent constraints on computational efficiency, there is an increasing number of learning problems that require real-time performance from an essentially infinite stream of incrementally arriving data. This paper demonstrates how even high-dimensional learning problems of this kind can successfully be dealt with by techniques from nonparametric regression and locally weighted learning. As an example, we describe the application of one of the most advanced of such algorithms, Locally Weighted Projection Regression (LWPR), to the on-line learning of the inverse dynamics model of an actual seven degree-of-freedom anthropomorphic robot arm. LWPR's linear computational complexity in the number of input dimensions, its inherent mechanisms of local dimensionality reduction, and its sound learning rule based on incremental stochastic leave-one-out cross validation allows -- to our knowledge for the first time -- implementing inverse dynamics learning for such a complex robot with real-time performance. In our sample task, the robot acquires the local inverse dynamics model needed to trace a figure-8 in only 60 seconds of training.

link (url) [BibTex]

link (url) [BibTex]


no image
Synchronized robot drumming by neural oscillator

Kotosaka, S., Schaal, S.

In The International Symposium on Adaptive Motion of Animals and Machines, Montreal, Canada, August 2000, clmc (inproceedings)

Abstract
Sensory-motor integration is one of the key issues in robotics. In this paper, we propose an approach to rhythmic arm movement control that is synchronized with an external signal based on exploiting a simple neural oscillator network. Trajectory generation by the neural oscillator is a biologically inspired method that can allow us to generate a smooth and continuous trajectory. The parameter tuning of the oscillators is used to generate a synchronized movement with wide intervals. We adopted the method for the drumming task as an example task. By using this method, the robot can realize synchronized drumming with wide drumming intervals in real time. The paper also shows the experimental results of drumming by a humanoid robot.

link (url) [BibTex]

link (url) [BibTex]


no image
Real-time robot learning with locally weighted statistical learning

Schaal, S., Atkeson, C. G., Vijayakumar, S.

In International Conference on Robotics and Automation (ICRA2000), San Francisco, April 2000, 2000, clmc (inproceedings)

Abstract
Locally weighted learning (LWL) is a class of statistical learning techniques that provides useful representations and training algorithms for learning about complex phenomena during autonomous adaptive control of robotic systems. This paper introduces several LWL algorithms that have been tested successfully in real-time learning of complex robot tasks. We discuss two major classes of LWL, memory-based LWL and purely incremental LWL that does not need to remember any data explicitly. In contrast to the traditional beliefs that LWL methods cannot work well in high-dimensional spaces, we provide new algorithms that have been tested in up to 50 dimensional learning problems. The applicability of our LWL algorithms is demonstrated in various robot learning examples, including the learning of devil-sticking, pole-balancing of a humanoid robot arm, and inverse-dynamics learning for a seven degree-of-freedom robot.

link (url) [BibTex]

link (url) [BibTex]


no image
Fast learning of biomimetic oculomotor control with nonparametric regression networks

Shibata, T., Schaal, S.

In International Conference on Robotics and Automation (ICRA2000), pages: 3847-3854, San Francisco, April 2000, 2000, clmc (inproceedings)

Abstract
Accurate oculomotor control is one of the essential pre-requisites of successful visuomotor coordination. Given the variable nonlinearities of the geometry of binocular vision as well as the possible nonlinearities of the oculomotor plant, it is desirable to accomplish accurate oculomotor control through learning approaches. In this paper, we investigate learning control for a biomimetic active vision system mounted on a humanoid robot. By combining a biologically inspired cerebellar learning scheme with a state-of-the-art statistical learning network, our robot system is able to acquire high performance visual stabilization reflexes after about 40 seconds of learning despite significant nonlinearities and processing delays in the system.

link (url) [BibTex]

link (url) [BibTex]


no image
Locally weighted projection regression: An O(n) algorithm for incremental real time learning in high dimensional spaces

Vijayakumar, S., Schaal, S.

In Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), 1, pages: 288-293, Stanford, CA, 2000, clmc (inproceedings)

Abstract
Locally weighted projection regression is a new algorithm that achieves nonlinear function approximation in high dimensional spaces with redundant and irrelevant input dimensions. At its core, it uses locally linear models, spanned by a small number of univariate regressions in selected directions in input space. This paper evaluates different methods of projection regression and derives a nonlinear function approximator based on them. This nonparametric local learning system i) learns rapidly with second order learning methods based on incremental training, ii) uses statistically sound stochastic cross validation to learn iii) adjusts its weighting kernels based on local information only, iv) has a computational complexity that is linear in the number of inputs, and v) can deal with a large number of - possibly redundant - inputs, as shown in evaluations with up to 50 dimensional data sets. To our knowledge, this is the first truly incremental spatially localized learning method to combine all these properties.

link (url) [BibTex]

link (url) [BibTex]


no image
Inverse kinematics for humanoid robots

Tevatia, G., Schaal, S.

In International Conference on Robotics and Automation (ICRA2000), pages: 294-299, San Fransisco, April 24-28, 2000, 2000, clmc (inproceedings)

Abstract
Real-time control of the endeffector of a humanoid robot in external coordinates requires computationally efficient solutions of the inverse kinematics problem. In this context, this paper investigates methods of resolved motion rate control (RMRC) that employ optimization criteria to resolve kinematic redundancies. In particular we focus on two established techniques, the pseudo inverse with explicit optimization and the extended Jacobian method. We prove that the extended Jacobian method includes pseudo-inverse methods as a special solution. In terms of computational complexity, however, pseudo-inverse and extended Jacobian differ significantly in favor of pseudo-inverse methods. Employing numerical estimation techniques, we introduce a computationally efficient version of the extended Jacobian with performance comparable to the original version . Our results are illustrated in simulation studies with a multiple degree-of-freedom robot, and were tested on a 30 degree-of-freedom robot. 

link (url) [BibTex]

link (url) [BibTex]


no image
Fast and efficient incremental learning for high-dimensional movement systems

Vijayakumar, S., Schaal, S.

In International Conference on Robotics and Automation (ICRA2000), San Francisco, April 2000, 2000, clmc (inproceedings)

Abstract
We introduce a new algorithm, Locally Weighted Projection Regression (LWPR), for incremental real-time learning of nonlinear functions, as particularly useful for problems of autonomous real-time robot control that re-quires internal models of dynamics, kinematics, or other functions. At its core, LWPR uses locally linear models, spanned by a small number of univariate regressions in selected directions in input space, to achieve piecewise linear function approximation. The most outstanding properties of LWPR are that it i) learns rapidly with second order learning methods based on incremental training, ii) uses statistically sound stochastic cross validation to learn iii) adjusts its local weighting kernels based on only local information to avoid interference problems, iv) has a computational complexity that is linear in the number of inputs, and v) can deal with a large number ofâ??possibly redundant and/or irrelevantâ??inputs, as shown in evaluations with up to 50 dimensional data sets for learning the inverse dynamics of an anthropomorphic robot arm. To our knowledge, this is the first incremental neural network learning method to combine all these properties and that is well suited for complex on-line learning problems in robotics.

link (url) [BibTex]

link (url) [BibTex]


no image
On-line learning for humanoid robot systems

Conradt, J., Tevatia, G., Vijayakumar, S., Schaal, S.

In Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), 1, pages: 191-198, Stanford, CA, 2000, clmc (inproceedings)

Abstract
Humanoid robots are high-dimensional movement systems for which analytical system identification and control methods are insufficient due to unknown nonlinearities in the system structure. As a way out, supervised learning methods can be employed to create model-based nonlinear controllers which use functions in the control loop that are estimated by learning algorithms. However, internal models for humanoid systems are rather high-dimensional such that conventional learning algorithms would suffer from slow learning speed, catastrophic interference, and the curse of dimensionality. In this paper we explore a new statistical learning algorithm, locally weighted projection regression (LWPR), for learning internal models in real-time. LWPR is a nonparametric spatially localized learning system that employs the less familiar technique of partial least squares regression to represent functional relationships in a piecewise linear fashion. The algorithm can work successfully in very high dimensional spaces and detect irrelevant and redundant inputs while only requiring a computational complexity that is linear in the number of input dimensions. We demonstrate the application of the algorithm in learning two classical internal models of robot control, the inverse kinematics and the inverse dynamics of an actual seven degree-of-freedom anthropomorphic robot arm. For both examples, LWPR can achieve excellent real-time learning results from less than one hour of actual training data.

link (url) [BibTex]

link (url) [BibTex]


no image
Humanoid Robot DB

Kotosaka, S., Shibata, T., Schaal, S.

In Proceedings of the International Conference on Machine Automation (ICMA2000), pages: 21-26, 2000, clmc (inproceedings)

[BibTex]

[BibTex]

1993


no image
Roles for memory-based learning in robotics

Atkeson, C. G., Schaal, S.

In Proceedings of the Sixth International Symposium on Robotics Research, pages: 503-521, Hidden Valley, PA, 1993, clmc (inproceedings)

[BibTex]

1993

[BibTex]


no image
Open loop stable control strategies for robot juggling

Schaal, S., Atkeson, C. G.

In IEEE International Conference on Robotics and Automation, 3, pages: 913-918, Piscataway, NJ: IEEE, Georgia, Atlanta, May 2-6, 1993, clmc (inproceedings)

Abstract
In a series of case studies out of the field of dynamic manipulation (Mason, 1992), different principles for open loop stable control are introduced and analyzed. This investigation may provide some insight into how open loop control can serve as a useful foundation for closed loop control and, particularly, what to focus on in learning control. 

link (url) [BibTex]

link (url) [BibTex]