Header logo is am


2013


Probabilistic Object Tracking Using a Range Camera
Probabilistic Object Tracking Using a Range Camera

Wüthrich, M., Pastor, P., Kalakrishnan, M., Bohg, J., Schaal, S.

In IEEE/RSJ International Conference on Intelligent Robots and Systems, pages: 3195-3202, IEEE, November 2013 (inproceedings)

Abstract
We address the problem of tracking the 6-DoF pose of an object while it is being manipulated by a human or a robot. We use a dynamic Bayesian network to perform inference and compute a posterior distribution over the current object pose. Depending on whether a robot or a human manipulates the object, we employ a process model with or without knowledge of control inputs. Observations are obtained from a range camera. As opposed to previous object tracking methods, we explicitly model self-occlusions and occlusions from the environment, e.g, the human or robotic hand. This leads to a strongly non-linear observation model and additional dependencies in the Bayesian network. We employ a Rao-Blackwellised particle filter to compute an estimate of the object pose at every time step. In a set of experiments, we demonstrate the ability of our method to accurately and robustly track the object pose in real-time while it is being manipulated by a human or a robot.

arXiv Video Code Video DOI Project Page [BibTex]

2013

arXiv Video Code Video DOI Project Page [BibTex]


Learning and Optimization with Submodular Functions
Learning and Optimization with Submodular Functions

Sankaran, B., Ghazvininejad, M., He, X., Kale, D., Cohen, L.

ArXiv, May 2013 (techreport)

Abstract
In many naturally occurring optimization problems one needs to ensure that the definition of the optimization problem lends itself to solutions that are tractable to compute. In cases where exact solutions cannot be computed tractably, it is beneficial to have strong guarantees on the tractable approximate solutions. In order operate under these criterion most optimization problems are cast under the umbrella of convexity or submodularity. In this report we will study design and optimization over a common class of functions called submodular functions. Set functions, and specifically submodular set functions, characterize a wide variety of naturally occurring optimization problems, and the property of submodularity of set functions has deep theoretical consequences with wide ranging applications. Informally, the property of submodularity of set functions concerns the intuitive principle of diminishing returns. This property states that adding an element to a smaller set has more value than adding it to a larger set. Common examples of submodular monotone functions are entropies, concave functions of cardinality, and matroid rank functions; non-monotone examples include graph cuts, network flows, and mutual information. In this paper we will review the formal definition of submodularity; the optimization of submodular functions, both maximization and minimization; and finally discuss some applications in relation to learning and reasoning using submodular functions.

arxiv link (url) [BibTex]

arxiv link (url) [BibTex]


Hypothesis Testing Framework for Active Object Detection
Hypothesis Testing Framework for Active Object Detection

Sankaran, B., Atanasov, N., Le Ny, J., Koletschka, T., Pappas, G., Daniilidis, K.

In IEEE International Conference on Robotics and Automation (ICRA), May 2013, clmc (inproceedings)

Abstract
One of the central problems in computer vision is the detection of semantically important objects and the estimation of their pose. Most of the work in object detection has been based on single image processing and its performance is limited by occlusions and ambiguity in appearance and geometry. This paper proposes an active approach to object detection by controlling the point of view of a mobile depth camera. When an initial static detection phase identifies an object of interest, several hypotheses are made about its class and orientation. The sensor then plans a sequence of view-points, which balances the amount of energy used to move with the chance of identifying the correct hypothesis. We formulate an active M-ary hypothesis testing problem, which includes sensor mobility, and solve it using a point-based approximate POMDP algorithm. The validity of our approach is verified through simulation and experiments with real scenes captured by a kinect sensor. The results suggest a significant improvement over static object detection.

pdf [BibTex]

pdf [BibTex]


no image
Action and Goal Related Decision Variables Modulate the Competition Between Multiple Potential Targets

Enachescu, V, Christopoulos, Vassilios N, Schrater, P. R., Schaal, S.

In Abstracts of Neural Control of Movement Conference (NCM 2013), February 2013 (inproceedings)

[BibTex]

[BibTex]


Fusing visual and tactile sensing for 3-D object reconstruction while grasping
Fusing visual and tactile sensing for 3-D object reconstruction while grasping

Ilonen, J., Bohg, J., Kyrki, V.

In IEEE International Conference on Robotics and Automation (ICRA), pages: 3547-3554, 2013 (inproceedings)

Abstract
In this work, we propose to reconstruct a complete 3-D model of an unknown object by fusion of visual and tactile information while the object is grasped. Assuming the object is symmetric, a first hypothesis of its complete 3-D shape is generated from a single view. This initial model is used to plan a grasp on the object which is then executed with a robotic manipulator equipped with tactile sensors. Given the detected contacts between the fingers and the object, the full object model including the symmetry parameters can be refined. This refined model will then allow the planning of more complex manipulation tasks. The main contribution of this work is an optimal estimation approach for the fusion of visual and tactile data applying the constraint of object symmetry. The fusion is formulated as a state estimation problem and solved with an iterative extended Kalman filter. The approach is validated experimentally using both artificial and real data from two different robotic platforms.

DOI Project Page [BibTex]

DOI Project Page [BibTex]


no image
Learning Objective Functions for Manipulation

Kalakrishnan, M., Pastor, P., Righetti, L., Schaal, S.

In 2013 IEEE International Conference on Robotics and Automation, IEEE, Karlsruhe, Germany, 2013 (inproceedings)

Abstract
We present an approach to learning objective functions for robotic manipulation based on inverse reinforcement learning. Our path integral inverse reinforcement learning algorithm can deal with high-dimensional continuous state-action spaces, and only requires local optimality of demonstrated trajectories. We use L 1 regularization in order to achieve feature selection, and propose an efficient algorithm to minimize the resulting convex objective function. We demonstrate our approach by applying it to two core problems in robotic manipulation. First, we learn a cost function for redundancy resolution in inverse kinematics. Second, we use our method to learn a cost function over trajectories, which is then used in optimization-based motion planning for grasping and manipulation tasks. Experimental results show that our method outperforms previous algorithms in high-dimensional settings.

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Learning Task Error Models for Manipulation

Pastor, P., Kalakrishnan, M., Binney, J., Kelly, J., Righetti, L., Sukhatme, G. S., Schaal, S.

In 2013 IEEE Conference on Robotics and Automation, IEEE, Karlsruhe, Germany, 2013 (inproceedings)

Abstract
Precise kinematic forward models are important for robots to successfully perform dexterous grasping and manipulation tasks, especially when visual servoing is rendered infeasible due to occlusions. A lot of research has been conducted to estimate geometric and non-geometric parameters of kinematic chains to minimize reconstruction errors. However, kinematic chains can include non-linearities, e.g. due to cable stretch and motor-side encoders, that result in significantly different errors for different parts of the state space. Previous work either does not consider such non-linearities or proposes to estimate non-geometric parameters of carefully engineered models that are robot specific. We propose a data-driven approach that learns task error models that account for such unmodeled non-linearities. We argue that in the context of grasping and manipulation, it is sufficient to achieve high accuracy in the task relevant state space. We identify this relevant state space using previously executed joint configurations and learn error corrections for those. Therefore, our system is developed to generate subsequent executions that are similar to previous ones. The experiments show that our method successfully captures the non-linearities in the head kinematic chain (due to a counterbalancing spring) and the arm kinematic chains (due to cable stretch) of the considered experimental platform, see Fig. 1. The feasibility of the presented error learning approach has also been evaluated in independent DARPA ARM-S testing contributing to successfully complete 67 out of 72 grasping and manipulation tasks.

link (url) DOI [BibTex]

link (url) DOI [BibTex]

2011


no image
STOMP: Stochastic trajectory optimization for motion planning

Kalakrishnan, M., Chitta, S., Theodorou, E., Pastor, P., Schaal, S.

In IEEE International Conference on Robotics and Automation (ICRA), Shanghai, China, May 9-13, 2011, clmc (inproceedings)

Abstract
We present a new approach to motion planning using a stochastic trajectory optimization framework. The approach relies on generating noisy trajectories to explore the space around an initial (possibly infeasible) trajectory, which are then combined to produced an updated trajectory with lower cost. A cost function based on a combination of obstacle and smoothness cost is optimized in each iteration. No gradient information is required for the particular optimization algorithm that we use and so general costs for which derivatives may not be available (e.g. costs corresponding to constraints and motor torques) can be included in the cost function. We demonstrate the approach both in simulation and on a dual-arm mobile manipulation system for unconstrained and constrained tasks. We experimentally show that the stochastic nature of STOMP allows it to overcome local minima that gradient-based optimizers like CHOMP can get stuck in.

link (url) Project Page [BibTex]

2011

link (url) Project Page [BibTex]


no image
Path Integral Control and Bounded Rationality

Braun, D. A., Ortega, P. A., Theodorou, E., Schaal, S.

In IEEE Symposium on Adaptive Dynamic Programming And Reinforcement Learning (ADPRL), 2011, clmc (inproceedings)

Abstract
Path integral methods [7], [15],[1] have recently been shown to be applicable to a very general class of optimal control problems. Here we examine the path integral formalism from a decision-theoretic point of view, since an optimal controller can always be regarded as an instance of a perfectly rational decision-maker that chooses its actions so as to maximize its expected utility [8]. The problem with perfect rationality is, however, that finding optimal actions is often very difficult due to prohibitive computational resource costs that are not taken into account. In contrast, a bounded rational decision-maker has only limited resources and therefore needs to strike some compromise between the desired utility and the required resource costs [14]. In particular, we suggest an information-theoretic measure of resource costs that can be derived axiomatically [11]. As a consequence we obtain a variational principle for choice probabilities that trades off maximizing a given utility criterion and avoiding resource costs that arise due to deviating from initially given default choice probabilities. The resulting bounded rational policies are in general probabilistic. We show that the solutions found by the path integral formalism are such bounded rational policies. Furthermore, we show that the same formalism generalizes to discrete control problems, leading to linearly solvable bounded rational control policies in the case of Markov systems. Importantly, Bellman?s optimality principle is not presupposed by this variational principle, but it can be derived as a limit case. This suggests that the information- theoretic formalization of bounded rationality might serve as a general principle in control design that unifies a number of recently reported approximate optimal control methods both in the continuous and discrete domain.

PDF [BibTex]

PDF [BibTex]


no image
Skill learning and task outcome prediction for manipulation

Pastor, P., Kalakrishnan, M., Chitta, S., Theodorou, E., Schaal, S.

In IEEE International Conference on Robotics and Automation (ICRA), Shanghai, China, May 9-13, 2011, clmc (inproceedings)

Abstract
Learning complex motor skills for real world tasks is a hard problem in robotic manipulation that often requires painstaking manual tuning and design by a human expert. In this work, we present a Reinforcement Learning based approach to acquiring new motor skills from demonstration. Our approach allows the robot to learn fine manipulation skills and significantly improve its success rate and skill level starting from a possibly coarse demonstration. Our approach aims to incorporate task domain knowledge, where appropriate, by working in a space consistent with the constraints of a specific task. In addition, we also present an approach to using sensor feedback to learn a predictive model of the task outcome. This allows our system to learn the proprioceptive sensor feedback needed to monitor subsequent executions of the task online and abort execution in the event of predicted failure. We illustrate our approach using two example tasks executed with the PR2 dual-arm robot: a straight and accurate pool stroke and a box flipping task using two chopsticks as tools.

link (url) Project Page Project Page [BibTex]

link (url) Project Page Project Page [BibTex]


no image
An Iterative Path Integral Stochastic Optimal Control Approach for Learning Robotic Tasks

Theodorou, E., Stulp, F., Buchli, J., Schaal, S.

In Proceedings of the 18th World Congress of the International Federation of Automatic Control, 2011, clmc (inproceedings)

Abstract
Recent work on path integral stochastic optimal control theory Theodorou et al. (2010a); Theodorou (2011) has shown promising results in planning and control of nonlinear systems in high dimensional state spaces. The path integral control framework relies on the transformation of the nonlinear Hamilton Jacobi Bellman (HJB) partial differential equation (PDE) into a linear PDE and the approximation of its solution via the use of the Feynman Kac lemma. In this work, we are reviewing the generalized version of path integral stochastic optimal control formalism Theodorou et al. (2010a), used for optimal control and planing of stochastic dynamical systems with state dependent control and diffusion matrices. Moreover we present the iterative path integral control approach, the so called Policy Improvement with Path Integrals or (PI2 ) which is capable of scaling in high dimensional robotic control problems. Furthermore we present a convergence analysis of the proposed algorithm and we apply the proposed framework to a variety of robotic tasks. Finally with the goal to perform locomotion the iterative path integral control is applied for learning nonlinear limit cycle attractors with adjustable land scape.

PDF [BibTex]

PDF [BibTex]


no image
Learning Force Control Policies for Compliant Manipulation

Kalakrishnan, M., Righetti, L., Pastor, P., Schaal, S.

In 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages: 4639-4644, IEEE, San Francisco, USA, sep 2011 (inproceedings)

Abstract
Developing robots capable of fine manipulation skills is of major importance in order to build truly assistive robots. These robots need to be compliant in their actuation and control in order to operate safely in human environments. Manipulation tasks imply complex contact interactions with the external world, and involve reasoning about the forces and torques to be applied. Planning under contact conditions is usually impractical due to computational complexity, and a lack of precise dynamics models of the environment. We present an approach to acquiring manipulation skills on compliant robots through reinforcement learning. The initial position control policy for manipulation is initialized through kinesthetic demonstration. We augment this policy with a force/torque profile to be controlled in combination with the position trajectories. We use the Policy Improvement with Path Integrals (PI2) algorithm to learn these force/torque profiles by optimizing a cost function that measures task success. We demonstrate our approach on the Barrett WAM robot arm equipped with a 6-DOF force/torque sensor on two different manipulation tasks: opening a door with a lever door handle, and picking up a pen off the table. We show that the learnt force control policies allow successful, robust execution of the tasks.

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Control of legged robots with optimal distribution of contact forces

Righetti, L., Buchli, J., Mistry, M., Schaal, S.

In 2011 11th IEEE-RAS International Conference on Humanoid Robots, pages: 318-324, IEEE, Bled, Slovenia, 2011 (inproceedings)

Abstract
The development of agile and safe humanoid robots require controllers that guarantee both high tracking performance and compliance with the environment. More specifically, the control of contact interaction is of crucial importance for robots that will actively interact with their environment. Model-based controllers such as inverse dynamics or operational space control are very appealing as they offer both high tracking performance and compliance. However, while widely used for fully actuated systems such as manipulators, they are not yet standard controllers for legged robots such as humanoids. Indeed such robots are fundamentally different from manipulators as they are underactuated due to their floating-base and subject to switching contact constraints. In this paper we present an inverse dynamics controller for legged robots that use torque redundancy to create an optimal distribution of contact constraints. The resulting controller is able to minimize, given a desired motion, any quadratic cost of the contact constraints at each instant of time. In particular we show how this can be used to minimize tangential forces during locomotion, therefore significantly improving the locomotion of legged robots on difficult terrains. In addition to the theoretical result, we present simulations of a humanoid and a quadruped robot, as well as experiments on a real quadruped robot that demonstrate the advantages of the controller.

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Learning Motion Primitive Goals for Robust Manipulation

Stulp, F., Theodorou, E., Kalakrishnan, M., Pastor, P., Righetti, L., Schaal, S.

In IEEE/RSJ International Conference on Intelligent Robots and Systems, pages: 325-331, IEEE, San Francisco, USA, sep 2011 (inproceedings)

Abstract
Applying model-free reinforcement learning to manipulation remains challenging for several reasons. First, manipulation involves physical contact, which causes discontinuous cost functions. Second, in manipulation, the end-point of the movement must be chosen carefully, as it represents a grasp which must be adapted to the pose and shape of the object. Finally, there is uncertainty in the object pose, and even the most carefully planned movement may fail if the object is not at the expected position. To address these challenges we 1) present a simplified, computationally more efficient version of our model-free reinforcement learning algorithm PI2; 2) extend PI2 so that it simultaneously learns shape parameters and goal parameters of motion primitives; 3) use shape and goal learning to acquire motion primitives that are robust to object pose uncertainty. We evaluate these contributions on a manipulation platform consisting of a 7-DOF arm with a 4-DOF hand.

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Inverse Dynamics Control of Floating-Base Robots with External Constraints: a Unified View

Righetti, L., Buchli, J., Mistry, M., Schaal, S.

In 2011 IEEE International Conference on Robotics and Automation, pages: 1085-1090, IEEE, Shanghai, China, 2011 (inproceedings)

Abstract
Inverse dynamics controllers and operational space controllers have proved to be very efficient for compliant control of fully actuated robots such as fixed base manipulators. However legged robots such as humanoids are inherently different as they are underactuated and subject to switching external contact constraints. Recently several methods have been proposed to create inverse dynamics controllers and operational space controllers for these robots. In an attempt to compare these different approaches, we develop a general framework for inverse dynamics control and show that these methods lead to very similar controllers. We are then able to greatly simplify recent whole-body controllers based on operational space approaches using kinematic projections, bringing them closer to efficient practical implementations. We also generalize these controllers such that they can be optimal under an arbitrary quadratic cost in the commands.

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Movement segmentation using a primitive library

Meier, F., Theodorou, E., Stulp, F., Schaal, S.

In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2011), Sept. 25-30, San Francisco, CA, 2011, clmc (inproceedings)

Abstract
Segmenting complex movements into a sequence of primitives remains a difficult problem with many applications in the robotics and vision communities. In this work, we show how the movement segmentation problem can be reduced to a sequential movement recognition problem. To this end, we reformulate the orig-inal Dynamic Movement Primitive (DMP) formulation as a linear dynamical sys-tem with control inputs. Based on this new formulation, we develop an Expecta-tion-Maximization algorithm to estimate the duration and goal position of a par-tially observed trajectory. With the help of this algorithm and the assumption that a library of movement primitives is present, we present a movement seg-mentation framework. We illustrate the usefulness of the new DMP formulation on the two applications of online movement recognition and movement segmen-tation.

link (url) [BibTex]

link (url) [BibTex]


no image
Online movement adaptation based on previous sensor experiences

Pastor, P., Righetti, L., Kalakrishnan, M., Schaal, S.

In 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages: 365-371, IEEE, San Francisco, USA, sep 2011 (inproceedings)

Abstract
Personal robots can only become widespread if they are capable of safely operating among humans. In uncertain and highly dynamic environments such as human households, robots need to be able to instantly adapt their behavior to unforseen events. In this paper, we propose a general framework to achieve very contact-reactive motions for robotic grasping and manipulation. Associating stereotypical movements to particular tasks enables our system to use previous sensor experiences as a predictive model for subsequent task executions. We use dynamical systems, named Dynamic Movement Primitives (DMPs), to learn goal-directed behaviors from demonstration. We exploit their dynamic properties by coupling them with the measured and predicted sensor traces. This feedback loop allows for online adaptation of the movement plan. Our system can create a rich set of possible motions that account for external perturbations and perception uncertainty to generate truly robust behaviors. As an example, we present an application to grasping with the WAM robot arm.

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Additional DOFs and sensors for bio-inspired locomotion: Towards active spine, ankle joints, and feet for a quadruped robot

Kuehn, D., Grimminger, F., Beinersdorf, F., Bernhard, F., Burchardt, A., Schilling, M., Simnofske, M., Stark, T., Zenzes, M., Kirchner, F.

In 2011 IEEE International Conference on Robotics and Biomimetics, pages: 2780-2786, December 2011 (inproceedings)

DOI [BibTex]

DOI [BibTex]


no image
Learning to grasp under uncertainty

Stulp, F., Theodorou, E., Buchli, J., Schaal, S.

In Robotics and Automation (ICRA), 2011 IEEE International Conference on, Shanghai, China, May 9-13, 2011, clmc (inproceedings)

Abstract
We present an approach that enables robots to learn motion primitives that are robust towards state estimation uncertainties. During reaching and preshaping, the robot learns to use fine manipulation strategies to maneuver the object into a pose at which closing the hand to perform the grasp is more likely to succeed. In contrast, common assumptions in grasp planning and motion planning for reaching are that these tasks can be performed independently, and that the robot has perfect knowledge of the pose of the objects in the environment. We implement our approach using Dynamic Movement Primitives and the probabilistic model-free reinforcement learning algorithm Policy Improvement with Path Integrals (PI2 ). The cost function that PI2 optimizes is a simple boolean that penalizes failed grasps. The key to acquiring robust motion primitives is to sample the actual pose of the object from a distribution that represents the state estimation uncertainty. During learning, the robot will thus optimize the chance of grasping an object from this distribution, rather than at one specific pose. In our empirical evaluation, we demonstrate how the motion primitives become more robust when grasping simple cylindrical objects, as well as more complex, non-convex objects. We also investigate how well the learned motion primitives generalize towards new object positions and other state estimation uncertainty distributions.

link (url) [BibTex]

link (url) [BibTex]

2006


no image
Learning operational space control

Peters, J., Schaal, S.

In Robotics: Science and Systems II (RSS 2006), pages: 255-262, (Editors: Gaurav S. Sukhatme and Stefan Schaal and Wolfram Burgard and Dieter Fox), Cambridge, MA: MIT Press, RSS , 2006, clmc (inproceedings)

Abstract
While operational space control is of essential importance for robotics and well-understood from an analytical point of view, it can be prohibitively hard to achieve accurate control in face of modeling errors, which are inevitable in complex robots, e.g., humanoid robots. In such cases, learning control methods can offer an interesting alternative to analytical control algorithms. However, the resulting learning problem is ill-defined as it requires to learn an inverse mapping of a usually redundant system, which is well known to suffer from the property of non-covexity of the solution space, i.e., the learning system could generate motor commands that try to steer the robot into physically impossible configurations. A first important insight for this paper is that, nevertheless, a physically correct solution to the inverse problem does exits when learning of the inverse map is performed in a suitable piecewise linear way. The second crucial component for our work is based on a recent insight that many operational space controllers can be understood in terms of a constraint optimal control problem. The cost function associated with this optimal control problem allows us to formulate a learning algorithm that automatically synthesizes a globally consistent desired resolution of redundancy while learning the operational space controller. From the view of machine learning, the learning problem corresponds to a reinforcement learning problem that maximizes an immediate reward and that employs an expectation-maximization policy search algorithm. Evaluations on a three degrees of freedom robot arm illustrate the feasability of our suggested approach.

link (url) [BibTex]

2006

link (url) [BibTex]


no image
Reinforcement Learning for Parameterized Motor Primitives

Peters, J., Schaal, S.

In Proceedings of the 2006 International Joint Conference on Neural Networks, pages: 73-80, IJCNN, 2006, clmc (inproceedings)

Abstract
One of the major challenges in both action generation for robotics and in the understanding of human motor control is to learn the "building blocks of movement generation", called motor primitives. Motor primitives, as used in this paper, are parameterized control policies such as splines or nonlinear differential equations with desired attractor properties. While a lot of progress has been made in teaching parameterized motor primitives using supervised or imitation learning, the self-improvement by interaction of the system with the environment remains a challenging problem. In this paper, we evaluate different reinforcement learning approaches for improving the performance of parameterized motor primitives. For pursuing this goal, we highlight the difficulties with current reinforcement learning methods, and outline both established and novel algorithms for the gradient-based improvement of parameterized policies. We compare these algorithms in the context of motor primitive learning, and show that our most modern algorithm, the Episodic Natural Actor-Critic outperforms previous algorithms by at least an order of magnitude. We demonstrate the efficiency of this reinforcement learning method in the application of learning to hit a baseball with an anthropomorphic robot arm.

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Statistical Learning of LQG controllers

Theodorou, E.

Technical Report-2006-1, Computational Action and Vision Lab University of Minnesota, 2006, clmc (techreport)

PDF [BibTex]

PDF [BibTex]

2003


no image
Dynamic movement primitives - A framework for motor control in humans and humanoid robots

Schaal, S.

In The International Symposium on Adaptive Motion of Animals and Machines, Kyoto, Japan, March 4-8, 2003, March 2003, clmc (inproceedings)

Abstract
Sensory-motor integration is one of the key issues in robotics. In this paper, we propose an approach to rhythmic arm movement control that is synchronized with an external signal based on exploiting a simple neural oscillator network. Trajectory generation by the neural oscillator is a biologically inspired method that can allow us to generate a smooth and continuous trajectory. The parameter tuning of the oscillators is used to generate a synchronized movement with wide intervals. We adopted the method for the drumming task as an example task. By using this method, the robot can realize synchronized drumming with wide drumming intervals in real time. The paper also shows the experimental results of drumming by a humanoid robot.

link (url) [BibTex]

2003

link (url) [BibTex]


no image
Bayesian backfitting

D’Souza, A., Vijayakumar, S., Schaal, S.

In Proceedings of the 10th Joint Symposium on Neural Computation (JSNC 2003), Irvine, CA, May 2003, 2003, clmc (inproceedings)

Abstract
We present an algorithm aimed at addressing both computational and analytical intractability of Bayesian regression models which operate in very high-dimensional, usually underconstrained spaces. Several domains of research frequently provide such datasets, including chemometrics [2], and human movement analysis [1]. The literature in nonparametric statistics provides interesting solutions such as Backfitting [3] and Partial Least Squares [4], which are extremely robust and efficient, yet lack a probabilistic interpretation that could place them in the context of current research in statistical learning algorithms that emphasize the estimation of confidence, posterior distributions, and model complexity. In order to achieve numerical robustness and low computational cost, we first derive a novel Bayesian interpretation of Backfitting (BB) as a computationally efficient regression algorithm. BBÕs learning complexity scales linearly with the input dimensionality by decoupling inference among individual input dimensions. We embed BB in an efficient, locally variational model selection mechanism that automatically grows the number of backfitting experts in a mixture-of-experts regression model. We demonstrate the effectiveness of the algorithm in performing principled regularization of model complexity when fitting nonlinear manifolds while avoiding the numerical hazards associated with highly underconstrained problems. We also note that this algorithm appears applicable in various areas of neural computation, e.g., in abstract models of computational neuroscience, or implementations of statistical learning on artificial systems.

link (url) [BibTex]

link (url) [BibTex]


no image
Reinforcement learning for humanoid robotics

Peters, J., Vijayakumar, S., Schaal, S.

In IEEE-RAS International Conference on Humanoid Robots (Humanoids2003), Karlsruhe, Germany, Sept.29-30, 2003, clmc (inproceedings)

Abstract
Reinforcement learning offers one of the most general framework to take traditional robotics towards true autonomy and versatility. However, applying reinforcement learning to high dimensional movement systems like humanoid robots remains an unsolved problem. In this paper, we discuss different approaches of reinforcement learning in terms of their applicability in humanoid robotics. Methods can be coarsely classified into three different categories, i.e., greedy methods, `vanilla' policy gradient methods, and natural gradient methods. We discuss that greedy methods are not likely to scale into the domain humanoid robotics as they are problematic when used with function approximation. `Vanilla' policy gradient methods on the other hand have been successfully applied on real-world robots including at least one humanoid robot. We demonstrate that these methods can be significantly improved using the natural policy gradient instead of the regular policy gradient. A derivation of the natural policy gradient is provided, proving that the average policy gradient of Kakade (2002) is indeed the true natural gradient. A general algorithm for estimating the natural gradient, the Natural Actor-Critic algorithm, is introduced. This algorithm converges to the nearest local minimum of the cost function with respect to the Fisher information metric under suitable conditions. The algorithm outperforms non-natural policy gradients by far in a cart-pole balancing evaluation, and for learning nonlinear dynamic motor primitives for humanoid robot control. It offers a promising route for the development of reinforcement learning for truly high dimensionally continuous state-action systems.

link (url) [BibTex]

link (url) [BibTex]


no image
Discovering imitation strategies through categorization of multi-cimensional data

Billard, A., Epars, Y., Schaal, S., Cheng, G.

In IEEE International Conference on Intelligent Robots and Systems (IROS 2003), Las Vegas, NV, Oct. 27-31, 2003, clmc (inproceedings)

Abstract
An essential problem of imitation is that of determining Ówhat to imitateÓ, i.e. to determine which of the many features of the demonstration are relevant to the task and which should be reproduced. The strategy followed by the imitator can be modeled as a hierarchical optimization system, which minimizes the discrepancy between two multidimensional datasets. We consider imitation of a manipulation task. To classify across manipulation strategies, we apply a probabilistic analysis to data in Cartesian and joint spaces. We determine a general metric that optimizes the policy of task reproduction, following strategy determination. The model successfully discovers strategies in six different manipulation tasks and controls task reproduction by a full body humanoid robot. or the complete path followed by the demonstrator. We follow a similar taxonomy and apply it to the learning and reproduction of a manipulation task by a humanoid robot. We take the perspective that the features of the movements to imitate are those that appear most frequently, i.e. the invariants in time. The model builds upon previous work [3], [4] and is composed of a hierarchical time delay neural network that extracts invariant features from a manipulation task performed by a human demonstrator. The system analyzes the Carthesian trajectories of the objects and the joint

link (url) [BibTex]

link (url) [BibTex]


no image
Scaling reinforcement learning paradigms for motor learning

Peters, J., Vijayakumar, S., Schaal, S.

In Proceedings of the 10th Joint Symposium on Neural Computation (JSNC 2003), Irvine, CA, May 2003, 2003, clmc (inproceedings)

Abstract
Reinforcement learning offers a general framework to explain reward related learning in artificial and biological motor control. However, current reinforcement learning methods rarely scale to high dimensional movement systems and mainly operate in discrete, low dimensional domains like game-playing, artificial toy problems, etc. This drawback makes them unsuitable for application to human or bio-mimetic motor control. In this poster, we look at promising approaches that can potentially scale and suggest a novel formulation of the actor-critic algorithm which takes steps towards alleviating the current shortcomings. We argue that methods based on greedy policies are not likely to scale into high-dimensional domains as they are problematic when used with function approximation Ð a must when dealing with continuous domains. We adopt the path of direct policy gradient based policy improvements since they avoid the problems of unstabilizing dynamics encountered in traditional value iteration based updates. While regular policy gradient methods have demonstrated promising results in the domain of humanoid notor control, we demonstrate that these methods can be significantly improved using the natural policy gradient instead of the regular policy gradient. Based on this, it is proved that KakadeÕs Ôaverage natural policy gradientÕ is indeed the true natural gradient. A general algorithm for estimating the natural gradient, the Natural Actor-Critic algorithm, is introduced. This algorithm converges with probability one to the nearest local minimum in Riemannian space of the cost function. The algorithm outperforms nonnatural policy gradients by far in a cart-pole balancing evaluation, and offers a promising route for the development of reinforcement learning for truly high-dimensionally continuous state-action systems.

link (url) [BibTex]

link (url) [BibTex]


no image
Design and Control of a Leg for the Running Machine PANTER

Berns, K., Grimminger, F., Hochholdinger, U., Kerscher, T., Albiez, J.

In Proceedings of the ICAR 2003–11th International Conference on Advanced Robotics, pages: 1737-1742, 2003 (inproceedings)

[BibTex]

[BibTex]


no image
Learning attractor landscapes for learning motor primitives

Ijspeert, A., Nakanishi, J., Schaal, S.

In Advances in Neural Information Processing Systems 15, pages: 1547-1554, (Editors: Becker, S.;Thrun, S.;Obermayer, K.), Cambridge, MA: MIT Press, 2003, clmc (inproceedings)

Abstract
If globally high dimensional data has locally only low dimensional distributions, it is advantageous to perform a local dimensionality reduction before further processing the data. In this paper we examine several techniques for local dimensionality reduction in the context of locally weighted linear regression. As possible candidates, we derive local versions of factor analysis regression, principle component regression, principle component regression on joint distributions, and partial least squares regression. After outlining the statistical bases of these methods, we perform Monte Carlo simulations to evaluate their robustness with respect to violations of their statistical assumptions. One surprising outcome is that locally weighted partial least squares regression offers the best average results, thus outperforming even factor analysis, the theoretically most appealing of our candidate techniques.Ê

link (url) [BibTex]

link (url) [BibTex]


no image
PANTER-prototype for a fast-running quadruped robot with pneumatic muscles

Albiez, J., Kerscher, T., Grimminger, F., Hochholdinger, U., Dillmann, R., Berns, K.

In Proceedings of the 6th International Conference on Climbing and Walking Robots, pages: 617-624, 2003 (inproceedings)

[BibTex]

[BibTex]


no image
Learning from demonstration and adaptation of biped locomotion with dynamical movement primitives

Nakanishi, J., Morimoto, J., Endo, G., Schaal, S., Kawato, M.

In Workshop on Robot Learning by Demonstration, IEEE International Conference on Intelligent Robots and Systems (IROS 2003), Las Vegas, NV, Oct. 27-31, 2003, clmc (inproceedings)

Abstract
In this paper, we report on our research for learning biped locomotion from human demonstration. Our ultimate goal is to establish a design principle of a controller in order to achieve natural human-like locomotion. We suggest dynamical movement primitives as a CPG of a biped robot, an approach we have previously proposed for learning and encoding complex human movements. Demonstrated trajectories are learned through the movement primitives by locally weighted regression, and the frequency of the learned trajectories is adjusted automatically by a novel frequency adaptation algorithm based on phase resetting and entrainment of oscillators. Numerical simulations demonstrate the effectiveness of the proposed locomotion controller.

link (url) [BibTex]

link (url) [BibTex]


no image
Movement planning and imitation by shaping nonlinear attractors

Schaal, S.

In Proceedings of the 12th Yale Workshop on Adaptive and Learning Systems, Yale University, New Haven, CT, 2003, clmc (inproceedings)

Abstract
Given the continuous stream of movements that biological systems exhibit in their daily activities, an account for such versatility and creativity has to assume that movement sequences consist of segments, executed either in sequence or with partial or complete overlap. Therefore, a fundamental question that has pervaded research in motor control both in artificial and biological systems revolves around identifying movement primitives (a.k.a. units of actions, basis behaviors, motor schemas, etc.). What are the fundamental building blocks that are strung together, adapted to, and created for ever new behaviors? This paper summarizes results that led to the hypothesis of Dynamic Movement Primitives (DMP). DMPs are units of action that are formalized as stable nonlinear attractor systems. They are useful for autonomous robotics as they are highly flexible in creating complex rhythmic (e.g., locomotion) and discrete (e.g., a tennis swing) behaviors that can quickly be adapted to the inevitable perturbations of a dy-namically changing, stochastic environment. Moreover, DMPs provide a formal framework that also lends itself to investigations in computational neuroscience. A recent finding that allows creating DMPs with the help of well-understood statistical learning methods has elevated DMPs from a more heuristic to a principled modeling approach, and, moreover, created a new foundation for imitation learning. Theoretical insights, evaluations on a humanoid robot, and behavioral and brain imaging data will serve to outline the framework of DMPs for a general approach to motor control and imitation in robotics and biology.

link (url) [BibTex]

link (url) [BibTex]

2002


no image
Learning rhythmic movements by demonstration using nonlinear oscillators

Ijspeert, J. A., Nakanishi, J., Schaal, S.

In IEEE International Conference on Intelligent Robots and Systems (IROS 2002), pages: 958-963, Piscataway, NJ: IEEE, Lausanne, Sept.30-Oct.4 2002, 2002, clmc (inproceedings)

Abstract
Locally weighted learning (LWL) is a class of statistical learning techniques that provides useful representations and training algorithms for learning about complex phenomena during autonomous adaptive control of robotic systems. This paper introduces several LWL algorithms that have been tested successfully in real-time learning of complex robot tasks. We discuss two major classes of LWL, memory-based LWL and purely incremental LWL that does not need to remember any data explicitly. In contrast to the traditional beliefs that LWL methods cannot work well in high-dimensional spaces, we provide new algorithms that have been tested in up to 50 dimensional learning problems. The applicability of our LWL algorithms is demonstrated in various robot learning examples, including the learning of devil-sticking, pole-balancing of a humanoid robot arm, and inverse-dynamics learning for a seven degree-of-freedom robot.

link (url) [BibTex]

2002

link (url) [BibTex]


no image
Reliable stair climbing in the simple hexapod ’RHex’

Moore, E. Z., Campbell, D., Grimminger, F., Buehler, M.

In Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292), 3, pages: 2222-2227 vol.3, May 2002 (inproceedings)

DOI [BibTex]

DOI [BibTex]


no image
Movement imitation with nonlinear dynamical systems in humanoid robots

Ijspeert, J. A., Nakanishi, J., Schaal, S.

In International Conference on Robotics and Automation (ICRA2002), Washinton, May 11-15 2002, 2002, clmc (inproceedings)

Abstract
Locally weighted learning (LWL) is a class of statistical learning techniques that provides useful representations and training algorithms for learning about complex phenomena during autonomous adaptive control of robotic systems. This paper introduces several LWL algorithms that have been tested successfully in real-time learning of complex robot tasks. We discuss two major classes of LWL, memory-based LWL and purely incremental LWL that does not need to remember any data explicitly. In contrast to the traditional beliefs that LWL methods cannot work well in high-dimensional spaces, we provide new algorithms that have been tested in up to 50 dimensional learning problems. The applicability of our LWL algorithms is demonstrated in various robot learning examples, including the learning of devil-sticking, pole-balancing of a humanoid robot arm, and inverse-dynamics learning for a seven degree-of-freedom robot.

link (url) [BibTex]

link (url) [BibTex]


no image
A locally weighted learning composite adaptive controller with structure adaptation

Nakanishi, J., Farrell, J. A., Schaal, S.

In IEEE International Conference on Intelligent Robots and Systems (IROS 2002), Lausanne, Sept.30-Oct.4 2002, 2002, clmc (inproceedings)

Abstract
This paper introduces a provably stable adaptive learning controller which employs nonlinear function approximation with automatic growth of the learning network according to the nonlinearities and the working domain of the control system. The unknown function in the dynamical system is approximated by piecewise linear models using a nonparametric regression technique. Local models are allocated as necessary and their parameters are optimized on-line. Inspired by composite adaptive control methods, the pro-posed learning adaptive control algorithm uses both the tracking error and the estimation error to up-date the parameters. We provide Lyapunov analyses that demonstrate the stability properties of the learning controller. Numerical simulations illustrate rapid convergence of the tracking error and the automatic structure adaptation capability of the function approximator. This paper introduces a provably stable adaptive learning controller which employs nonlinear function approximation with automatic growth of the learning network according to the nonlinearities and the working domain of the control system. The unknown function in the dynamical system is approximated by piecewise linear models using a nonparametric regression technique. Local models are allocated as necessary and their parameters are optimized on-line. Inspired by composite adaptive control methods, the pro-posed learning adaptive control algorithm uses both the tracking error and the estimation error to up-date the parameters. We provide Lyapunov analyses that demonstrate the stability properties of the learning controller. Numerical simulations illustrate rapid convergence of the tracking error and the automatic structure adaptation capability of the function approximator

link (url) [BibTex]

link (url) [BibTex]

1995


no image
A kendama learning robot based on a dynamic optimization theory

Miyamoto, H., Gandolfo, F., Gomi, H., Schaal, S., Koike, Y., Osu, R., Nakano, E., Kawato, M.

In Preceedings of the 4th IEEE International Workshop on Robot and Human Communication (RO-MAN’95), pages: 327-332, Tokyo, July 1995, clmc (inproceedings)

[BibTex]

1995

[BibTex]

1994


no image
Robot learning by nonparametric regression

Schaal, S., Atkeson, C. G.

In Proceedings of the International Conference on Intelligent Robots and Systems (IROS’94), pages: 478-485, Munich Germany, 1994, clmc (inproceedings)

Abstract
We present an approach to robot learning grounded on a nonparametric regression technique, locally weighted regression. The model of the task to be performed is represented by infinitely many local linear models, i.e., the (hyper-) tangent planes at every query point. Such a model, however, is only generated when a query is performed and is not retained. This is in contrast to other methods using a finite set of linear models to accomplish a piecewise linear model. Architectural parameters of our approach, such as distance metrics, are also a function of the current query point instead of being global. Statistical tests are presented for when a local model is good enough such that it can be reliably used to build a local controller. These statistical measures also direct the exploration of the robot. We explicitly deal with the case where prediction accuracy requirements exist during exploration: By gradually shifting a center of exploration and controlling the speed of the shift with local prediction accuracy, a goal-directed exploration of state space takes place along the fringes of the current data support until the task goal is achieved. We illustrate this approach by describing how it has been used to enable a robot to learn a challenging juggling task: Within 40 to 100 trials the robot accomplished the task goal starting out with no initial experiences.

[BibTex]

1994

[BibTex]


no image
Assessing the quality of learned local models

Schaal, S., Atkeson, C. G.

In Advances in Neural Information Processing Systems 6, pages: 160-167, (Editors: Cowan, J.;Tesauro, G.;Alspector, J.), Morgan Kaufmann, San Mateo, CA, 1994, clmc (inproceedings)

Abstract
An approach is presented to learning high dimensional functions in the case where the learning algorithm can affect the generation of new data. A local modeling algorithm, locally weighted regression, is used to represent the learned function. Architectural parameters of the approach, such as distance metrics, are also localized and become a function of the query point instead of being global. Statistical tests are given for when a local model is good enough and sampling should be moved to a new area. Our methods explicitly deal with the case where prediction accuracy requirements exist during exploration: By gradually shifting a "center of exploration" and controlling the speed of the shift with local prediction accuracy, a goal-directed exploration of state space takes place along the fringes of the current data support until the task goal is achieved. We illustrate this approach with simulation results and results from a real robot learning a complex juggling task.

link (url) [BibTex]

link (url) [BibTex]


no image
Memory-based robot learning

Schaal, S., Atkeson, C. G.

In IEEE International Conference on Robotics and Automation, 3, pages: 2928-2933, San Diego, CA, 1994, clmc (inproceedings)

Abstract
We present a memory-based local modeling approach to robot learning using a nonparametric regression technique, locally weighted regression. The model of the task to be performed is represented by infinitely many local linear models, the (hyper-) tangent planes at every query point. This is in contrast to other methods using a finite set of linear models to accomplish a piece-wise linear model. Architectural parameters of our approach, such as distance metrics, are a function of the current query point instead of being global. Statistical tests are presented for when a local model is good enough such that it can be reliably used to build a local controller. These statistical measures also direct the exploration of the robot. We explicitly deal with the case where prediction accuracy requirements exist during exploration: By gradually shifting a center of exploration and controlling the speed of the shift with local prediction accuracy, a goal-directed exploration of state space takes place along the fringes of the current data support until the task goal is achieved. We illustrate this approach by describing how it has been used to enable a robot to learn a challenging juggling task: within 40 to 100 trials the robot accomplished the task goal starting out with no initial experiences.

[BibTex]

[BibTex]


no image
Nonparametric regression for learning

Schaal, S.

In Conference on Adaptive Behavior and Learning, Center of Interdisciplinary Research (ZIF) Bielefeld Germany, also technical report TR-H-098 of the ATR Human Information Processing Research Laboratories, 1994, clmc (inproceedings)

Abstract
In recent years, learning theory has been increasingly influenced by the fact that many learning algorithms have at least in part a comprehensive interpretation in terms of well established statistical theories. Furthermore, with little modification, several statistical methods can be directly cast into learning algorithms. One family of such methods stems from nonparametric regression. This paper compares nonparametric learning with the more widely used parametric counterparts and investigates how these two families differ in their properties and their applicability. 

link (url) [BibTex]

link (url) [BibTex]

1992


no image
What should be learned?

Schaal, S., Atkeson, C. G., Botros, S.

In Proceedings of Seventh Yale Workshop on Adaptive and Learning Systems, pages: 199-204, New Haven, CT, May 20-22, 1992, clmc (inproceedings)

[BibTex]

1992

[BibTex]