Header logo is am


2019


no image
On the Transfer of Inductive Bias from Simulation to the Real World: a New Disentanglement Dataset

Gondal, M. W., Wuthrich, M., Miladinovic, D., Locatello, F., Breidt, M., Volchkov, V., Akpo, J., Bachem, O., Schölkopf, B., Bauer, S.

Advances in Neural Information Processing Systems 32, pages: 15714-15725, (Editors: H. Wallach and H. Larochelle and A. Beygelzimer and F. d’Alché-Buc and E. Fox and R. Garnett), Curran Associates, Inc., 33rd Annual Conference on Neural Information Processing Systems, December 2019 (conference)

link (url) [BibTex]

2019

link (url) [BibTex]


Accurate Vision-based Manipulation through Contact Reasoning
Accurate Vision-based Manipulation through Contact Reasoning

Kloss, A., Bauza, M., Wu, J., Tenenbaum, J. B., Rodriguez, A., Bohg, J.

In International Conference on Robotics and Automation, May 2019 (inproceedings) Accepted

Abstract
Planning contact interactions is one of the core challenges of many robotic tasks. Optimizing contact locations while taking dynamics into account is computationally costly and in only partially observed environments, executing contact-based tasks often suffers from low accuracy. We present an approach that addresses these two challenges for the problem of vision-based manipulation. First, we propose to disentangle contact from motion optimization. Thereby, we improve planning efficiency by focusing computation on promising contact locations. Second, we use a hybrid approach for perception and state estimation that combines neural networks with a physically meaningful state representation. In simulation and real-world experiments on the task of planar pushing, we show that our method is more efficient and achieves a higher manipulation accuracy than previous vision-based approaches.

Video link (url) [BibTex]

Video link (url) [BibTex]


Learning Latent Space Dynamics for Tactile Servoing
Learning Latent Space Dynamics for Tactile Servoing

Sutanto, G., Ratliff, N., Sundaralingam, B., Chebotar, Y., Su, Z., Handa, A., Fox, D.

In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) 2019, IEEE, International Conference on Robotics and Automation, May 2019 (inproceedings) Accepted

pdf video [BibTex]

pdf video [BibTex]


Leveraging Contact Forces for Learning to Grasp
Leveraging Contact Forces for Learning to Grasp

Merzic, H., Bogdanovic, M., Kappler, D., Righetti, L., Bohg, J.

In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) 2019, IEEE, International Conference on Robotics and Automation, May 2019 (inproceedings)

Abstract
Grasping objects under uncertainty remains an open problem in robotics research. This uncertainty is often due to noisy or partial observations of the object pose or shape. To enable a robot to react appropriately to unforeseen effects, it is crucial that it continuously takes sensor feedback into account. While visual feedback is important for inferring a grasp pose and reaching for an object, contact feedback offers valuable information during manipulation and grasp acquisition. In this paper, we use model-free deep reinforcement learning to synthesize control policies that exploit contact sensing to generate robust grasping under uncertainty. We demonstrate our approach on a multi-fingered hand that exhibits more complex finger coordination than the commonly used two- fingered grippers. We conduct extensive experiments in order to assess the performance of the learned policies, with and without contact sensing. While it is possible to learn grasping policies without contact sensing, our results suggest that contact feedback allows for a significant improvement of grasping robustness under object pose uncertainty and for objects with a complex shape.

video arXiv [BibTex]

video arXiv [BibTex]

2006


no image
Learning operational space control

Peters, J., Schaal, S.

In Robotics: Science and Systems II (RSS 2006), pages: 255-262, (Editors: Gaurav S. Sukhatme and Stefan Schaal and Wolfram Burgard and Dieter Fox), Cambridge, MA: MIT Press, RSS , 2006, clmc (inproceedings)

Abstract
While operational space control is of essential importance for robotics and well-understood from an analytical point of view, it can be prohibitively hard to achieve accurate control in face of modeling errors, which are inevitable in complex robots, e.g., humanoid robots. In such cases, learning control methods can offer an interesting alternative to analytical control algorithms. However, the resulting learning problem is ill-defined as it requires to learn an inverse mapping of a usually redundant system, which is well known to suffer from the property of non-covexity of the solution space, i.e., the learning system could generate motor commands that try to steer the robot into physically impossible configurations. A first important insight for this paper is that, nevertheless, a physically correct solution to the inverse problem does exits when learning of the inverse map is performed in a suitable piecewise linear way. The second crucial component for our work is based on a recent insight that many operational space controllers can be understood in terms of a constraint optimal control problem. The cost function associated with this optimal control problem allows us to formulate a learning algorithm that automatically synthesizes a globally consistent desired resolution of redundancy while learning the operational space controller. From the view of machine learning, the learning problem corresponds to a reinforcement learning problem that maximizes an immediate reward and that employs an expectation-maximization policy search algorithm. Evaluations on a three degrees of freedom robot arm illustrate the feasability of our suggested approach.

link (url) [BibTex]

2006

link (url) [BibTex]


no image
Reinforcement Learning for Parameterized Motor Primitives

Peters, J., Schaal, S.

In Proceedings of the 2006 International Joint Conference on Neural Networks, pages: 73-80, IJCNN, 2006, clmc (inproceedings)

Abstract
One of the major challenges in both action generation for robotics and in the understanding of human motor control is to learn the "building blocks of movement generation", called motor primitives. Motor primitives, as used in this paper, are parameterized control policies such as splines or nonlinear differential equations with desired attractor properties. While a lot of progress has been made in teaching parameterized motor primitives using supervised or imitation learning, the self-improvement by interaction of the system with the environment remains a challenging problem. In this paper, we evaluate different reinforcement learning approaches for improving the performance of parameterized motor primitives. For pursuing this goal, we highlight the difficulties with current reinforcement learning methods, and outline both established and novel algorithms for the gradient-based improvement of parameterized policies. We compare these algorithms in the context of motor primitive learning, and show that our most modern algorithm, the Episodic Natural Actor-Critic outperforms previous algorithms by at least an order of magnitude. We demonstrate the efficiency of this reinforcement learning method in the application of learning to hit a baseball with an anthropomorphic robot arm.

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Policy gradient methods for robotics

Peters, J., Schaal, S.

In Proceedings of the IEEE International Conference on Intelligent Robotics Systems, pages: 2219-2225, IROS, 2006, clmc (inproceedings)

Abstract
The aquisition and improvement of motor skills and control policies for robotics from trial and error is of essential importance if robots should ever leave precisely pre-structured environments. However, to date only few existing reinforcement learning methods have been scaled into the domains of highdimensional robots such as manipulator, legged or humanoid robots. Policy gradient methods remain one of the few exceptions and have found a variety of applications. Nevertheless, the application of such methods is not without peril if done in an uninformed manner. In this paper, we give an overview on learning with policy gradient methods for robotics with a strong focus on recent advances in the field. We outline previous applications to robotics and show how the most recently developed methods can significantly improve learning performance. Finally, we evaluate our most promising algorithm in the application of hitting a baseball with an anthropomorphic arm.

link (url) DOI [BibTex]

link (url) DOI [BibTex]

2002


no image
Learning rhythmic movements by demonstration using nonlinear oscillators

Ijspeert, J. A., Nakanishi, J., Schaal, S.

In IEEE International Conference on Intelligent Robots and Systems (IROS 2002), pages: 958-963, Piscataway, NJ: IEEE, Lausanne, Sept.30-Oct.4 2002, 2002, clmc (inproceedings)

Abstract
Locally weighted learning (LWL) is a class of statistical learning techniques that provides useful representations and training algorithms for learning about complex phenomena during autonomous adaptive control of robotic systems. This paper introduces several LWL algorithms that have been tested successfully in real-time learning of complex robot tasks. We discuss two major classes of LWL, memory-based LWL and purely incremental LWL that does not need to remember any data explicitly. In contrast to the traditional beliefs that LWL methods cannot work well in high-dimensional spaces, we provide new algorithms that have been tested in up to 50 dimensional learning problems. The applicability of our LWL algorithms is demonstrated in various robot learning examples, including the learning of devil-sticking, pole-balancing of a humanoid robot arm, and inverse-dynamics learning for a seven degree-of-freedom robot.

link (url) [BibTex]

2002

link (url) [BibTex]


no image
Reliable stair climbing in the simple hexapod ’RHex’

Moore, E. Z., Campbell, D., Grimminger, F., Buehler, M.

In Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292), 3, pages: 2222-2227 vol.3, May 2002 (inproceedings)

DOI [BibTex]

DOI [BibTex]


no image
Movement imitation with nonlinear dynamical systems in humanoid robots

Ijspeert, J. A., Nakanishi, J., Schaal, S.

In International Conference on Robotics and Automation (ICRA2002), Washinton, May 11-15 2002, 2002, clmc (inproceedings)

Abstract
Locally weighted learning (LWL) is a class of statistical learning techniques that provides useful representations and training algorithms for learning about complex phenomena during autonomous adaptive control of robotic systems. This paper introduces several LWL algorithms that have been tested successfully in real-time learning of complex robot tasks. We discuss two major classes of LWL, memory-based LWL and purely incremental LWL that does not need to remember any data explicitly. In contrast to the traditional beliefs that LWL methods cannot work well in high-dimensional spaces, we provide new algorithms that have been tested in up to 50 dimensional learning problems. The applicability of our LWL algorithms is demonstrated in various robot learning examples, including the learning of devil-sticking, pole-balancing of a humanoid robot arm, and inverse-dynamics learning for a seven degree-of-freedom robot.

link (url) [BibTex]

link (url) [BibTex]


no image
A locally weighted learning composite adaptive controller with structure adaptation

Nakanishi, J., Farrell, J. A., Schaal, S.

In IEEE International Conference on Intelligent Robots and Systems (IROS 2002), Lausanne, Sept.30-Oct.4 2002, 2002, clmc (inproceedings)

Abstract
This paper introduces a provably stable adaptive learning controller which employs nonlinear function approximation with automatic growth of the learning network according to the nonlinearities and the working domain of the control system. The unknown function in the dynamical system is approximated by piecewise linear models using a nonparametric regression technique. Local models are allocated as necessary and their parameters are optimized on-line. Inspired by composite adaptive control methods, the pro-posed learning adaptive control algorithm uses both the tracking error and the estimation error to up-date the parameters. We provide Lyapunov analyses that demonstrate the stability properties of the learning controller. Numerical simulations illustrate rapid convergence of the tracking error and the automatic structure adaptation capability of the function approximator. This paper introduces a provably stable adaptive learning controller which employs nonlinear function approximation with automatic growth of the learning network according to the nonlinearities and the working domain of the control system. The unknown function in the dynamical system is approximated by piecewise linear models using a nonparametric regression technique. Local models are allocated as necessary and their parameters are optimized on-line. Inspired by composite adaptive control methods, the pro-posed learning adaptive control algorithm uses both the tracking error and the estimation error to up-date the parameters. We provide Lyapunov analyses that demonstrate the stability properties of the learning controller. Numerical simulations illustrate rapid convergence of the tracking error and the automatic structure adaptation capability of the function approximator

link (url) [BibTex]

link (url) [BibTex]

1997


no image
Learning from demonstration

Schaal, S.

In Advances in Neural Information Processing Systems 9, pages: 1040-1046, (Editors: Mozer, M. C.;Jordan, M.;Petsche, T.), MIT Press, Cambridge, MA, 1997, clmc (inproceedings)

Abstract
By now it is widely accepted that learning a task from scratch, i.e., without any prior knowledge, is a daunting undertaking. Humans, however, rarely attempt to learn from scratch. They extract initial biases as well as strategies how to approach a learning problem from instructions and/or demonstrations of other humans. For learning control, this paper investigates how learning from demonstration can be applied in the context of reinforcement learning. We consider priming the Q-function, the value function, the policy, and the model of the task dynamics as possible areas where demonstrations can speed up learning. In general nonlinear learning problems, only model-based reinforcement learning shows significant speed-up after a demonstration, while in the special case of linear quadratic regulator (LQR) problems, all methods profit from the demonstration. In an implementation of pole balancing on a complex anthropomorphic robot arm, we demonstrate that, when facing the complexities of real signal processing, model-based reinforcement learning offers the most robustness for LQR problems. Using the suggested methods, the robot learns pole balancing in just a single trial after a 30 second long demonstration of the human instructor. 

link (url) [BibTex]

1997

link (url) [BibTex]


no image
Robot learning from demonstration

Atkeson, C. G., Schaal, S.

In Machine Learning: Proceedings of the Fourteenth International Conference (ICML ’97), pages: 12-20, (Editors: Fisher Jr., D. H.), Morgan Kaufmann, Nashville, TN, July 8-12, 1997, 1997, clmc (inproceedings)

Abstract
The goal of robot learning from demonstration is to have a robot learn from watching a demonstration of the task to be performed. In our approach to learning from demonstration the robot learns a reward function from the demonstration and a task model from repeated attempts to perform the task. A policy is computed based on the learned reward function and task model. Lessons learned from an implementation on an anthropomorphic robot arm using a pendulum swing up task include 1) simply mimicking demonstrated motions is not adequate to perform this task, 2) a task planner can use a learned model and reward function to compute an appropriate policy, 3) this model-based planning process supports rapid learning, 4) both parametric and nonparametric models can be learned and used, and 5) incorporating a task level direct learning component, which is non-model-based, in addition to the model-based planner, is useful in compensating for structural modeling errors and slow model learning. 

link (url) [BibTex]

link (url) [BibTex]


no image
Local dimensionality reduction for locally weighted learning

Vijayakumar, S., Schaal, S.

In International Conference on Computational Intelligence in Robotics and Automation, pages: 220-225, Monteray, CA, July10-11, 1997, 1997, clmc (inproceedings)

Abstract
Incremental learning of sensorimotor transformations in high dimensional spaces is one of the basic prerequisites for the success of autonomous robot devices as well as biological movement systems. So far, due to sparsity of data in high dimensional spaces, learning in such settings requires a significant amount of prior knowledge about the learning task, usually provided by a human expert. In this paper we suggest a partial revision of the view. Based on empirical studies, it can been observed that, despite being globally high dimensional and sparse, data distributions from physical movement systems are locally low dimensional and dense. Under this assumption, we derive a learning algorithm, Locally Adaptive Subspace Regression, that exploits this property by combining a local dimensionality reduction as a preprocessing step with a nonparametric learning technique, locally weighted regression. The usefulness of the algorithm and the validity of its assumptions are illustrated for a synthetic data set and data of the inverse dynamics of an actual 7 degree-of-freedom anthropomorphic robot arm.

link (url) [BibTex]

link (url) [BibTex]


no image
Learning tasks from a single demonstration

Atkeson, C. G., Schaal, S.

In IEEE International Conference on Robotics and Automation (ICRA97), 2, pages: 1706-1712, Piscataway, NJ: IEEE, Albuquerque, NM, 20-25 April, 1997, clmc (inproceedings)

Abstract
Learning a complex dynamic robot manoeuvre from a single human demonstration is difficult. This paper explores an approach to learning from demonstration based on learning an optimization criterion from the demonstration and a task model from repeated attempts to perform the task, and using the learned criterion and model to compute an appropriate robot movement. A preliminary version of the approach has been implemented on an anthropomorphic robot arm using a pendulum swing up task as an example

link (url) [BibTex]

link (url) [BibTex]