Header logo is am


2013


Thumb xl impact battery
Probabilistic Object Tracking Using a Range Camera

Wüthrich, M., Pastor, P., Kalakrishnan, M., Bohg, J., Schaal, S.

In IEEE/RSJ International Conference on Intelligent Robots and Systems, pages: 3195-3202, IEEE, November 2013 (inproceedings)

Abstract
We address the problem of tracking the 6-DoF pose of an object while it is being manipulated by a human or a robot. We use a dynamic Bayesian network to perform inference and compute a posterior distribution over the current object pose. Depending on whether a robot or a human manipulates the object, we employ a process model with or without knowledge of control inputs. Observations are obtained from a range camera. As opposed to previous object tracking methods, we explicitly model self-occlusions and occlusions from the environment, e.g, the human or robotic hand. This leads to a strongly non-linear observation model and additional dependencies in the Bayesian network. We employ a Rao-Blackwellised particle filter to compute an estimate of the object pose at every time step. In a set of experiments, we demonstrate the ability of our method to accurately and robustly track the object pose in real-time while it is being manipulated by a human or a robot.

arXiv Video Code Video DOI Project Page [BibTex]

2013

arXiv Video Code Video DOI Project Page [BibTex]


Thumb xl featureextraction
Hypothesis Testing Framework for Active Object Detection

Sankaran, B., Atanasov, N., Le Ny, J., Koletschka, T., Pappas, G., Daniilidis, K.

In IEEE International Conference on Robotics and Automation (ICRA), May 2013, clmc (inproceedings)

Abstract
One of the central problems in computer vision is the detection of semantically important objects and the estimation of their pose. Most of the work in object detection has been based on single image processing and its performance is limited by occlusions and ambiguity in appearance and geometry. This paper proposes an active approach to object detection by controlling the point of view of a mobile depth camera. When an initial static detection phase identifies an object of interest, several hypotheses are made about its class and orientation. The sensor then plans a sequence of view-points, which balances the amount of energy used to move with the chance of identifying the correct hypothesis. We formulate an active M-ary hypothesis testing problem, which includes sensor mobility, and solve it using a point-based approximate POMDP algorithm. The validity of our approach is verified through simulation and experiments with real scenes captured by a kinect sensor. The results suggest a significant improvement over static object detection.

pdf [BibTex]

pdf [BibTex]


no image
Action and Goal Related Decision Variables Modulate the Competition Between Multiple Potential Targets

Enachescu, V, Christopoulos, Vassilios N, Schrater, P. R., Schaal, S.

In Abstracts of Neural Control of Movement Conference (NCM 2013), February 2013 (inproceedings)

[BibTex]

[BibTex]


Thumb xl screen shot 2015 08 23 at 00.29.36
Fusing visual and tactile sensing for 3-D object reconstruction while grasping

Ilonen, J., Bohg, J., Kyrki, V.

In IEEE International Conference on Robotics and Automation (ICRA), pages: 3547-3554, 2013 (inproceedings)

Abstract
In this work, we propose to reconstruct a complete 3-D model of an unknown object by fusion of visual and tactile information while the object is grasped. Assuming the object is symmetric, a first hypothesis of its complete 3-D shape is generated from a single view. This initial model is used to plan a grasp on the object which is then executed with a robotic manipulator equipped with tactile sensors. Given the detected contacts between the fingers and the object, the full object model including the symmetry parameters can be refined. This refined model will then allow the planning of more complex manipulation tasks. The main contribution of this work is an optimal estimation approach for the fusion of visual and tactile data applying the constraint of object symmetry. The fusion is formulated as a state estimation problem and solved with an iterative extended Kalman filter. The approach is validated experimentally using both artificial and real data from two different robotic platforms.

DOI Project Page [BibTex]

DOI Project Page [BibTex]


no image
Learning Objective Functions for Manipulation

Kalakrishnan, M., Pastor, P., Righetti, L., Schaal, S.

In 2013 IEEE International Conference on Robotics and Automation, IEEE, Karlsruhe, Germany, 2013 (inproceedings)

Abstract
We present an approach to learning objective functions for robotic manipulation based on inverse reinforcement learning. Our path integral inverse reinforcement learning algorithm can deal with high-dimensional continuous state-action spaces, and only requires local optimality of demonstrated trajectories. We use L 1 regularization in order to achieve feature selection, and propose an efficient algorithm to minimize the resulting convex objective function. We demonstrate our approach by applying it to two core problems in robotic manipulation. First, we learn a cost function for redundancy resolution in inverse kinematics. Second, we use our method to learn a cost function over trajectories, which is then used in optimization-based motion planning for grasping and manipulation tasks. Experimental results show that our method outperforms previous algorithms in high-dimensional settings.

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Learning Task Error Models for Manipulation

Pastor, P., Kalakrishnan, M., Binney, J., Kelly, J., Righetti, L., Sukhatme, G. S., Schaal, S.

In 2013 IEEE Conference on Robotics and Automation, IEEE, Karlsruhe, Germany, 2013 (inproceedings)

Abstract
Precise kinematic forward models are important for robots to successfully perform dexterous grasping and manipulation tasks, especially when visual servoing is rendered infeasible due to occlusions. A lot of research has been conducted to estimate geometric and non-geometric parameters of kinematic chains to minimize reconstruction errors. However, kinematic chains can include non-linearities, e.g. due to cable stretch and motor-side encoders, that result in significantly different errors for different parts of the state space. Previous work either does not consider such non-linearities or proposes to estimate non-geometric parameters of carefully engineered models that are robot specific. We propose a data-driven approach that learns task error models that account for such unmodeled non-linearities. We argue that in the context of grasping and manipulation, it is sufficient to achieve high accuracy in the task relevant state space. We identify this relevant state space using previously executed joint configurations and learn error corrections for those. Therefore, our system is developed to generate subsequent executions that are similar to previous ones. The experiments show that our method successfully captures the non-linearities in the head kinematic chain (due to a counterbalancing spring) and the arm kinematic chains (due to cable stretch) of the considered experimental platform, see Fig. 1. The feasibility of the presented error learning approach has also been evaluated in independent DARPA ARM-S testing contributing to successfully complete 67 out of 72 grasping and manipulation tasks.

link (url) DOI [BibTex]

link (url) DOI [BibTex]

2012


Thumb xl screen shot 2015 08 23 at 13.56.29
Towards Multi-DOF model mediated teleoperation: Using vision to augment feedback

Willaert, B., Bohg, J., Van Brussel, H., Niemeyer, G.

In IEEE International Workshop on Haptic Audio Visual Environments and Games (HAVE), pages: 25-31, October 2012 (inproceedings)

Abstract
In this paper, we address some of the challenges that arise as model-mediated teleoperation is applied to systems with multiple degrees of freedom and multiple sensors. Specifically we use a system with position, force, and vision sensors to explore an environment geometry in two degrees of freedom. The inclusion of vision is proposed to alleviate the difficulties of estimating an increasing number of environment properties. Vision can furthermore increase the predictive nature of model-mediated teleoperation, by effectively predicting touch feedback before the slave is even in contact with the environment. We focus on the case of estimating the location and orientation of a local surface patch at the contact point between the slave and the environment. We describe the various information sources with their respective limitations and create a combined model estimator as part of a multi-d.o.f. model-mediated controller. An experiment demonstrates the feasibility and benefits of utilizing vision sensors in teleoperation.

DOI [BibTex]

2012

DOI [BibTex]


Thumb xl sankaran iros 20121
Failure Recovery with Shared Autonomy

Sankaran, B., Pitzer, B., Osentoski, S.

In International Conference on Intelligent Robots and Systems, October 2012 (inproceedings)

Abstract
Building robots capable of long term autonomy has been a long standing goal of robotics research. Such systems must be capable of performing certain tasks with a high degree of robustness and repeatability. In the context of personal robotics, these tasks could range anywhere from retrieving items from a refrigerator, loading a dishwasher, to setting up a dinner table. Given the complexity of tasks there are a multitude of failure scenarios that the robot can encounter, irrespective of whether the environment is static or dynamic. For a robot to be successful in such situations, it would need to know how to recover from failures or when to ask a human for help. This paper, presents a novel shared autonomy behavioral executive to addresses these issues. We demonstrate how this executive combines generalized logic based recovery and human intervention to achieve continuous failure free operation. We tested the systems over 250 trials of two different use case experiments. Our current algorithm drastically reduced human intervention from 26% to 4% on the first experiment and 46% to 9% on the second experiment. This system provides a new dimension to robot autonomy, where robots can exhibit long term failure free operation with minimal human supervision. We also discuss how the system can be generalized.

link (url) [BibTex]

link (url) [BibTex]


Thumb xl bottlehandovergrasp
Task-Based Grasp Adaptation on a Humanoid Robot

Bohg, J., Welke, K., León, B., Do, M., Song, D., Wohlkinger, W., Aldoma, A., Madry, M., Przybylski, M., Asfour, T., Marti, H., Kragic, D., Morales, A., Vincze, M.

In 10th IFAC Symposium on Robot Control, SyRoCo 2012, Dubrovnik, Croatia, September 5-7, 2012., pages: 779-786, September 2012 (inproceedings)

Abstract
In this paper, we present an approach towards autonomous grasping of objects according to their category and a given task. Recent advances in the field of object segmentation and categorization as well as task-based grasp inference have been leveraged by integrating them into one pipeline. This allows us to transfer task-specific grasp experience between objects of the same category. The effectiveness of the approach is demonstrated on the humanoid robot ARMAR-IIIa.

Video pdf DOI [BibTex]

Video pdf DOI [BibTex]


no image
Movement Segmentation and Recognition for Imitation Learning

Meier, F., Theodorou, E., Schaal, S.

In Seventeenth International Conference on Artificial Intelligence and Statistics, La Palma, Canary Islands, Fifteenth International Conference on Artificial Intelligence and Statistics , April 2012 (inproceedings)

link (url) [BibTex]

link (url) [BibTex]


no image
Inverse dynamics with optimal distribution of contact forces for the control of legged robots

Righetti, L., Schaal, S.

In Dynamic Walking 2012, Pensacola, 2012 (inproceedings)

[BibTex]

[BibTex]


no image
Encoding of Periodic and their Transient Motions by a Single Dynamic Movement Primitive

Ernesti, J., Righetti, L., Do, M., Asfour, T., Schaal, S.

In 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012), pages: 57-64, IEEE, Osaka, Japan, November 2012 (inproceedings)

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Learning Force Control Policies for Compliant Robotic Manipulation

Kalakrishnan, M., Righetti, L., Pastor, P., Schaal, S.

In ICML’12 Proceedings of the 29th International Coference on International Conference on Machine Learning, pages: 49-50, Edinburgh, Scotland, 2012 (inproceedings)

[BibTex]

[BibTex]


no image
Quadratic programming for inverse dynamics with optimal distribution of contact forces

Righetti, L., Schaal, S.

In 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012), pages: 538-543, IEEE, Osaka, Japan, November 2012 (inproceedings)

Abstract
In this contribution we propose an inverse dynamics controller for a humanoid robot that exploits torque redundancy to minimize any combination of linear and quadratic costs in the contact forces and the commands. In addition the controller satisfies linear equality and inequality constraints in the contact forces and the commands such as torque limits, unilateral contacts or friction cones limits. The originality of our approach resides in the formulation of the problem as a quadratic program where we only need to solve for the control commands and where the contact forces are optimized implicitly. Furthermore, we do not need a structured representation of the dynamics of the robot (i.e. an explicit computation of the inertia matrix). It is in contrast with existing methods based on quadratic programs. The controller is then robust to uncertainty in the estimation of the dynamics model and the optimization is fast enough to be implemented in high bandwidth torque control loops that are increasingly available on humanoid platforms. We demonstrate properties of our controller with simulations of a human size humanoid robot.

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Towards Associative Skill Memories

Pastor, P., Kalakrishnan, M., Righetti, L., Schaal, S.

In 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012), pages: 309-315, IEEE, Osaka, Japan, November 2012 (inproceedings)

Abstract
Movement primitives as basis of movement planning and control have become a popular topic in recent years. The key idea of movement primitives is that a rather small set of stereotypical movements should suffice to create a large set of complex manipulation skills. An interesting side effect of stereotypical movement is that it also creates stereotypical sensory events, e.g., in terms of kinesthetic variables, haptic variables, or, if processed appropriately, visual variables. Thus, a movement primitive executed towards a particular object in the environment will associate a large number of sensory variables that are typical for this manipulation skill. These association can be used to increase robustness towards perturbations, and they also allow failure detection and switching towards other behaviors. We call such movement primitives augmented with sensory associations Associative Skill Memories (ASM). This paper addresses how ASMs can be acquired by imitation learning and how they can create robust manipulation skill by determining subsequent ASMs online to achieve a particular manipulation goal. Evaluation for grasping and manipulation with a Barrett WAM/Hand illustrate our approach.

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Template-based learning of grasp selection

Herzog, A., Pastor, P., Kalakrishnan, M., Righetti, L., Asfour, T., Schaal, S.

In 2012 IEEE International Conference on Robotics and Automation, pages: 2379-2384, IEEE, Saint Paul, USA, 2012 (inproceedings)

Abstract
The ability to grasp unknown objects is an important skill for personal robots, which has been addressed by many present and past research projects, but still remains an open problem. A crucial aspect of grasping is choosing an appropriate grasp configuration, i.e. the 6d pose of the hand relative to the object and its finger configuration. Finding feasible grasp configurations for novel objects, however, is challenging because of the huge variety in shape and size of these objects. Moreover, possible configurations also depend on the specific kinematics of the robotic arm and hand in use. In this paper, we introduce a new grasp selection algorithm able to find object grasp poses based on previously demonstrated grasps. Assuming that objects with similar shapes can be grasped in a similar way, we associate to each demonstrated grasp a grasp template. The template is a local shape descriptor for a possible grasp pose and is constructed using 3d information from depth sensors. For each new object to grasp, the algorithm then finds the best grasp candidate in the library of templates. The grasp selection is also able to improve over time using the information of previous grasp attempts to adapt the ranking of the templates. We tested the algorithm on two different platforms, the Willow Garage PR2 and the Barrett WAM arm which have very different hands. Our results show that the algorithm is able to find good grasp configurations for a large set of objects from a relatively small set of demonstrations, and does indeed improve its performance over time.

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Probabilistic depth image registration incorporating nonvisual information

Wüthrich, M., Pastor, P., Righetti, L., Billard, A., Schaal, S.

In 2012 IEEE International Conference on Robotics and Automation, pages: 3637-3644, IEEE, Saint Paul, USA, 2012 (inproceedings)

Abstract
In this paper, we derive a probabilistic registration algorithm for object modeling and tracking. In many robotics applications, such as manipulation tasks, nonvisual information about the movement of the object is available, which we will combine with the visual information. Furthermore we do not only consider observations of the object, but we also take space into account which has been observed to not be part of the object. Furthermore we are computing a posterior distribution over the relative alignment and not a point estimate as typically done in for example Iterative Closest Point (ICP). To our knowledge no existing algorithm meets these three conditions and we thus derive a novel registration algorithm in a Bayesian framework. Experimental results suggest that the proposed methods perform favorably in comparison to PCL [1] implementations of feature mapping and ICP, especially if nonvisual information is available.

link (url) DOI [BibTex]

link (url) DOI [BibTex]

2010


no image
Reinforcement learning of full-body humanoid motor skills

Stulp, F., Buchli, J., Theodorou, E., Schaal, S.

In Humanoid Robots (Humanoids), 2010 10th IEEE-RAS International Conference on, pages: 405-410, December 2010, clmc (inproceedings)

Abstract
Applying reinforcement learning to humanoid robots is challenging because humanoids have a large number of degrees of freedom and state and action spaces are continuous. Thus, most reinforcement learning algorithms would become computationally infeasible and require a prohibitive amount of trials to explore such high-dimensional spaces. In this paper, we present a probabilistic reinforcement learning approach, which is derived from the framework of stochastic optimal control and path integrals. The algorithm, called Policy Improvement with Path Integrals (PI2), has a surprisingly simple form, has no open tuning parameters besides the exploration noise, is model-free, and performs numerically robustly in high dimensional learning problems. We demonstrate how PI2 is able to learn full-body motor skills on a 34-DOF humanoid robot. To demonstrate the generality of our approach, we also apply PI2 in the context of variable impedance control, where both planned trajectories and gain schedules for each joint are optimized simultaneously.

link (url) [BibTex]

2010

link (url) [BibTex]


no image
Relative Entropy Policy Search

Peters, J., Mülling, K., Altun, Y.

In Proceedings of the Twenty-Fourth National Conference on Artificial Intelligence, pages: 1607-1612, (Editors: Fox, M. , D. Poole), AAAI Press, Menlo Park, CA, USA, Twenty-Fourth National Conference on Artificial Intelligence (AAAI-10), July 2010 (inproceedings)

Abstract
Policy search is a successful approach to reinforcement learning. However, policy improvements often result in the loss of information. Hence, it has been marred by premature convergence and implausible solutions. As first suggested in the context of covariant policy gradients (Bagnell and Schneider 2003), many of these problems may be addressed by constraining the information loss. In this paper, we continue this path of reasoning and suggest the Relative Entropy Policy Search (REPS) method. The resulting method differs significantly from previous policy gradient approaches and yields an exact update step. It works well on typical reinforcement learning benchmark problems.

PDF Web [BibTex]

PDF Web [BibTex]


no image
Reinforcement learning of motor skills in high dimensions: A path integral approach

Theodorou, E., Buchli, J., Schaal, S.

In Robotics and Automation (ICRA), 2010 IEEE International Conference on, pages: 2397-2403, May 2010, clmc (inproceedings)

Abstract
Reinforcement learning (RL) is one of the most general approaches to learning control. Its applicability to complex motor systems, however, has been largely impossible so far due to the computational difficulties that reinforcement learning encounters in high dimensional continuous state-action spaces. In this paper, we derive a novel approach to RL for parameterized control policies based on the framework of stochastic optimal control with path integrals. While solidly grounded in optimal control theory and estimation theory, the update equations for learning are surprisingly simple and have no danger of numerical instabilities as neither matrix inversions nor gradient learning rates are required. Empirical evaluations demonstrate significant performance improvements over gradient-based policy learning and scalability to high-dimensional control problems. Finally, a learning experiment on a robot dog illustrates the functionality of our algorithm in a real-world scenario. We believe that our new algorithm, Policy Improvement with Path Integrals (PI2), offers currently one of the most efficient, numerically robust, and easy to implement algorithms for RL in robotics.

link (url) [BibTex]

link (url) [BibTex]


no image
Inverse dynamics control of floating base systems using orthogonal decomposition

Mistry, M., Buchli, J., Schaal, S.

In Robotics and Automation (ICRA), 2010 IEEE International Conference on, pages: 3406-3412, May 2010, clmc (inproceedings)

Abstract
Model-based control methods can be used to enable fast, dexterous, and compliant motion of robots without sacrificing control accuracy. However, implementing such techniques on floating base robots, e.g., humanoids and legged systems, is non-trivial due to under-actuation, dynamically changing constraints from the environment, and potentially closed loop kinematics. In this paper, we show how to compute the analytically correct inverse dynamics torques for model-based control of sufficiently constrained floating base rigid-body systems, such as humanoid robots with one or two feet in contact with the environment. While our previous inverse dynamics approach relied on an estimation of contact forces to compute an approximate inverse dynamics solution, here we present an analytically correct solution by using an orthogonal decomposition to project the robot dynamics onto a reduced dimensional space, independent of contact forces. We demonstrate the feasibility and robustness of our approach on a simulated floating base bipedal humanoid robot and an actual robot dog locomoting over rough terrain.

link (url) [BibTex]

link (url) [BibTex]


no image
Fast, robust quadruped locomotion over challenging terrain

Kalakrishnan, M., Buchli, J., Pastor, P., Mistry, M., Schaal, S.

In Robotics and Automation (ICRA), 2010 IEEE International Conference on, pages: 2665-2670, May 2010, clmc (inproceedings)

Abstract
We present a control architecture for fast quadruped locomotion over rough terrain. We approach the problem by decomposing it into many sub-systems, in which we apply state-of-the-art learning, planning, optimization and control techniques to achieve robust, fast locomotion. Unique features of our control strategy include: (1) a system that learns optimal foothold choices from expert demonstration using terrain templates, (2) a body trajectory optimizer based on the Zero-Moment Point (ZMP) stability criterion, and (3) a floating-base inverse dynamics controller that, in conjunction with force control, allows for robust, compliant locomotion over unperceived obstacles. We evaluate the performance of our controller by testing it on the LittleDog quadruped robot, over a wide variety of rough terrain of varying difficulty levels. We demonstrate the generalization ability of this controller by presenting test results from an independent external test team on terrains that have never been shown to us.

link (url) [BibTex]

link (url) [BibTex]


no image
Are reaching movements planned in kinematic or dynamic coordinates?

Ellmer, A., Schaal, S.

In Abstracts of Neural Control of Movement Conference (NCM 2010), Naples, Florida, 2010, 2010, clmc (inproceedings)

Abstract
Whether human reaching movements are planned and optimized in kinematic (task space) or dynamic (joint or muscle space) coordinates is still an issue of debate. The first hypothesis implies that a planner produces a desired end-effector position at each point in time during the reaching movement, whereas the latter hypothesis includes the dynamics of the muscular-skeletal control system to produce a continuous end-effector trajectory. Previous work by Wolpert et al (1995) showed that when subjects were led to believe that their straight reaching paths corresponded to curved paths as shown on a computer screen, participants adapted the true path of their hand such that they would visually perceive a straight line in visual space, despite that they actually produced a curved path. These results were interpreted as supporting the stance that reaching trajectories are planned in kinematic coordinates. However, this experiment could only demonstrate that adaptation to altered paths, i.e. the position of the end-effector, did occur, but not that the precise timing of end-effector position was equally planned, i.e., the trajectory. Our current experiment aims at filling this gap by explicitly testing whether position over time, i.e. velocity, is a property of reaching movements that is planned in kinematic coordinates. In the current experiment, the velocity profiles of cursor movements corresponding to the participant's hand motions were skewed either to the left or to the right; the path itself was left unaltered. We developed an adaptation paradigm, where the skew of the velocity profile was introduced gradually and participants reported no awareness of any manipulation. Preliminary results indicate that the true hand motion of participants did not alter, i.e. there was no adaptation so as to counterbalance the introduced skew. However, for some participants, peak hand velocities were lowered for higher skews, which suggests that participants interpreted the manipulation as mere noise due to variance in their own movement. In summary, for a visuomotor transformation task, the hypothesis of a planned continuous end-effector trajectory predicts adaptation to a modified velocity profile. The current experiment found no systematic adaptation under such transformation, but did demonstrate an effect that is more in accordance that subjects could not perceive the manipulation and rather interpreted as an increase of noise.

[BibTex]

[BibTex]


no image
Optimality in Neuromuscular Systems

Theodorou, E. A., Valero-Cuevas, F.

In 32nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2010, clmc (inproceedings)

Abstract
Abstract? We provide an overview of optimal control meth- ods to nonlinear neuromuscular systems and discuss their lim- itations. Moreover we extend current optimal control methods to their application to neuromuscular models with realistically numerous musculotendons; as most prior work is limited to torque-driven systems. Recent work on computational motor control has explored the used of control theory and esti- mation as a conceptual tool to understand the underlying computational principles of neuromuscular systems. After all, successful biological systems regularly meet conditions for stability, robustness and performance for multiple classes of complex tasks. Among a variety of proposed control theory frameworks to explain this, stochastic optimal control has become a dominant framework to the point of being a standard computational technique to reproduce kinematic trajectories of reaching movements (see [12]) In particular, we demonstrate the application of optimal control to a neuromuscular model of the index finger with all seven musculotendons producing a tapping task. Our simu- lations include 1) a muscle model that includes force- length and force-velocity characteristics; 2) an anatomically plausible biomechanical model of the index finger that includes a tendi- nous network for the extensor mechanism and 3) a contact model that is based on a nonlinear spring-damper attached at the end effector of the index finger. We demonstrate that it is feasible to apply optimal control to systems with realistically large state vectors and conclude that, while optimal control is an adequate formalism to create computational models of neuro- musculoskeletal systems, there remain important challenges and limitations that need to be considered and overcome such as contact transitions, curse of dimensionality, and constraints on states and controls.

PDF [BibTex]

PDF [BibTex]


no image
Learning Policy Improvements with Path Integrals

Theodorou, E. A., Buchli, J., Schaal, S.

In International Conference on Artificial Intelligence and Statistics (AISTATS 2010), 2010, clmc (inproceedings)

Abstract
With the goal to generate more scalable algo- rithms with higher efficiency and fewer open parameters, reinforcement learning (RL) has recently moved towards combining classi- cal techniques from optimal control and dy- namic programming with modern learning techniques from statistical estimation the- ory. In this vein, this paper suggests the framework of stochastic optimal control with path integrals to derive a novel approach to RL with parametrized policies. While solidly grounded in value function estimation and optimal control based on the stochastic Hamilton-Jacobi-Bellman (HJB) equations, policy improvements can be transformed into an approximation problem of a path inte- gral which has no open parameters other than the exploration noise. The resulting algorithm can be conceived of as model- based, semi-model-based, or even model free, depending on how the learning problem is structured. Our new algorithm demon- strates interesting similarities with previous RL research in the framework of proba- bility matching and provides intuition why the slightly heuristically motivated proba- bility matching approach can actually per- form well. Empirical evaluations demon- strate significant performance improvements over gradient-based policy learning and scal- ability to high-dimensional control problems. We believe that Policy Improvement with Path Integrals (PI2) offers currently one of the most efficient, numerically robust, and easy to implement algorithms for RL based on trajectory roll-outs.

PDF [BibTex]

PDF [BibTex]


no image
Learning optimal control solutions: a path integral approach

Theodorou, E., Schaal, S.

In Abstracts of Neural Control of Movement Conference (NCM 2010), Naples, Florida, 2010, 2010, clmc (inproceedings)

Abstract
Investigating principles of human motor control in the framework of optimal control has had a long tradition in neural control of movement, and has recently experienced a new surge of investigations. Ideally, optimal control problems are addresses as a reinforcement learning (RL) problem, which would allow to investigate both the process of acquiring an optimal control solution as well as the solution itself. Unfortunately, the applicability of RL to complex neural and biomechanics systems has been largely impossible so far due to the computational difficulties that arise in high dimensional continuous state-action spaces. As a way out, research has focussed on computing optimal control solutions based on iterative optimal control methods that are based on linear and quadratic approximations of dynamical models and cost functions. These methods require perfect knowledge of the dynamics and cost functions while they are based on gradient and Newton optimization schemes. Their applicability is also restricted to low dimensional problems due to problematic convergence in high dimensions. Moreover, the process of computing the optimal solution is removed from the learning process that might be plausible in biology. In this work, we present a new reinforcement learning method for learning optimal control solutions or motor control. This method, based on the framework of stochastic optimal control with path integrals, has a very solid theoretical foundation, while resulting in surprisingly simple learning algorithms. It is also possible to apply this approach without knowledge of the system model, and to use a wide variety of complex nonlinear cost functions for optimization. We illustrate the theoretical properties of this approach and its applicability to learning motor control tasks for reaching movements and locomotion studies. We discuss its applicability to learning desired trajectories, variable stiffness control (co-contraction), and parameterized control policies. We also investigate the applicability to signal dependent noise control systems. We believe that the suggested method offers one of the easiest to use approaches to learning optimal control suggested in the literature so far, which makes it ideally suited for computational investigations of biological motor control.

[BibTex]

[BibTex]


no image
Constrained Accelerations for Controlled Geometric Reduction: Sagittal-Plane Decoupling for Bipedal Locomotion

Gregg, R., Righetti, L., Buchli, J., Schaal, S.

In 2010 10th IEEE-RAS International Conference on Humanoid Robots, pages: 1-7, IEEE, Nashville, USA, 2010 (inproceedings)

Abstract
Energy-shaping control methods have produced strong theoretical results for asymptotically stable 3D bipedal dynamic walking in the literature. In particular, geometric controlled reduction exploits robot symmetries to control momentum conservation laws that decouple the sagittal-plane dynamics, which are easier to stabilize. However, the associated control laws require high-dimensional matrix inverses multiplied with complicated energy-shaping terms, often making these control theories difficult to apply to highly-redundant humanoid robots. This paper presents a first step towards the application of energy-shaping methods on real robots by casting controlled reduction into a framework of constrained accelerations for inverse dynamics control. By representing momentum conservation laws as constraints in acceleration space, we construct a general expression for desired joint accelerations that render the constraint surface invariant. By appropriately choosing an orthogonal projection, we show that the unconstrained (reduced) dynamics are decoupled from the constrained dynamics. Any acceleration-based controller can then be used to stabilize this planar subsystem, including passivity-based methods. The resulting control law is surprisingly simple and represents a practical way to employ control theoretic stability results in robotic platforms. Simulated walking of a 3D compass-gait biped show correspondence between the new and original controllers, and simulated motions of a 16-DOF humanoid demonstrate the applicability of this method.

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Variable impedance control - a reinforcement learning approach

Buchli, J., Theodorou, E., Stulp, F., Schaal, S.

In Robotics Science and Systems (2010), Zaragoza, Spain, June 27-30, 2010, clmc (inproceedings)

Abstract
One of the hallmarks of the performance, versatility, and robustness of biological motor control is the ability to adapt the impedance of the overall biomechanical system to different task requirements and stochastic disturbances. A transfer of this principle to robotics is desirable, for instance to enable robots to work robustly and safely in everyday human environments. It is, however, not trivial to derive variable impedance controllers for practical high DOF robotic tasks. In this contribution, we accomplish such gain scheduling with a reinforcement learning approach algorithm, PI2 (Policy Improvement with Path Integrals). PI2 is a model-free, sampling based learning method derived from first principles of optimal control. The PI2 algorithm requires no tuning of algorithmic parameters besides the exploration noise. The designer can thus fully focus on cost function design to specify the task. From the viewpoint of robotics, a particular useful property of PI2 is that it can scale to problems of many DOFs, so that RL on real robotic systems becomes feasible. We sketch the PI2 algorithm and its theoretical properties, and how it is applied to gain scheduling. We evaluate our approach by presenting results on two different simulated robotic systems, a 3-DOF Phantom Premium Robot and a 6-DOF Kuka Lightweight Robot. We investigate tasks where the optimal strategy requires both tuning of the impedance of the end-effector, and tuning of a reference trajectory. The results show that we can use path integral based RL not only for planning but also to derive variable gain feedback controllers in realistic scenarios. Thus, the power of variable impedance control is made available to a wide variety of robotic systems and practical applications.

link (url) [BibTex]

link (url) [BibTex]


no image
Inverse dynamics with optimal distribution of ground reaction forces for legged robot

Righetti, L., Buchli, J., Mistry, M., Schaal, S.

In Proceedings of the 13th International Conference on Climbing and Walking Robots (CLAWAR), pages: 580-587, Nagoya, Japan, sep 2010 (inproceedings)

Abstract
Contact interaction with the environment is crucial in the design of locomotion controllers for legged robots, to prevent slipping for example. Therefore, it is of great importance to be able to control the effects of the robots movements on the contact reaction forces. In this contribution, we extend a recent inverse dynamics algorithm for floating base robots to optimize the distribution of contact forces while achieving precise trajectory tracking. The resulting controller is algorithmically simple as compared to other approaches. Numerical simulations show that this result significantly increases the range of possible movements of a humanoid robot as compared to the previous inverse dynamics algorithm. We also present a simplification of the result where no inversion of the inertia matrix is needed which is particularly relevant for practical use on a real robot. Such an algorithm becomes interesting for agile locomotion of robots on difficult terrains where the contacts with the environment are critical, such as walking over rough or slippery terrain.

DOI [BibTex]

DOI [BibTex]

2008


no image
Human movement generation based on convergent flow fields: A computational model and a behavioral experiment

Hoffmann, H., Schaal, S.

In Advances in Computational Motor Control VII, Symposium at the Society for Neuroscience Meeting, Washington DC, 2008, 2008, clmc (inproceedings)

link (url) [BibTex]

2008

link (url) [BibTex]


no image
Movement reproduction and obstacle avoidance with dynamic movement primitives and potential fields

Park, D., Hoffmann, H., Pastor, P., Schaal, S.

In IEEE International Conference on Humanoid Robots, 2008., 2008, clmc (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
The dual role of uncertainty in force field learning

Mistry, M., Theodorou, E., Hoffmann, H., Schaal, S.

In Abstracts of the Eighteenth Annual Meeting of Neural Control of Movement (NCM), Naples, Florida, April 29-May 4, 2008, clmc (inproceedings)

Abstract
Force field experiments have been a successful paradigm for studying the principles of planning, execution, and learning in human arm movements. Subjects have been shown to cope with the disturbances generated by force fields by learning internal models of the underlying dynamics to predict disturbance effects or by increasing arm impedance (via co-contraction) if a predictive approach becomes infeasible. Several studies have addressed the issue uncertainty in force field learning. Scheidt et al. demonstrated that subjects exposed to a viscous force field of fixed structure but varying strength (randomly changing from trial to trial), learn to adapt to the mean disturbance, regardless of the statistical distribution. Takahashi et al. additionally show a decrease in strength of after-effects after learning in the randomly varying environment. Thus they suggest that the nervous system adopts a dual strategy: learning an internal model of the mean of the random environment, while simultaneously increasing arm impedance to minimize the consequence of errors. In this study, we examine what role variance plays in the learning of uncertain force fields. We use a 7 degree-of-freedom exoskeleton robot as a manipulandum (Sarcos Master Arm, Sarcos, Inc.), and apply a 3D viscous force field of fixed structure and strength randomly selected from trial to trial. Additionally, in separate blocks of trials, we alter the variance of the randomly selected strength multiplier (while keeping a constant mean). In each block, after sufficient learning has occurred, we apply catch trials with no force field and measure the strength of after-effects. As expected in higher variance cases, results show increasingly smaller levels of after-effects as the variance is increased, thus implying subjects choose the robust strategy of increasing arm impedance to cope with higher levels of uncertainty. Interestingly, however, subjects show an increase in after-effect strength with a small amount of variance as compared to the deterministic (zero variance) case. This result implies that a small amount of variability aides in internal model formation, presumably a consequence of the additional amount of exploration conducted in the workspace of the task.

[BibTex]

[BibTex]


no image
Dynamic movement primitives for movement generation motivated by convergent force fields in frog

Hoffmann, H., Pastor, P., Schaal, S.

In Adaptive Motion of Animals and Machines (AMAM), 2008, clmc (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
Behavioral experiments on reinforcement learning in human motor control

Hoffmann, H., Theodorou, E., Schaal, S.

In Abstracts of the Eighteenth Annual Meeting of Neural Control of Movement (NCM), Naples, Florida, April 29-May 4, 2008, clmc (inproceedings)

Abstract
Reinforcement learning (RL) - learning solely based on reward or cost feedback - is widespread in robotics control and has been also suggested as computational model for human motor control. In human motor control, however, hardly any experiment studied reinforcement learning. Here, we study learning based on visual cost feedback in a reaching task and did three experiments: (1) to establish a simple enough experiment for RL, (2) to study spatial localization of RL, and (3) to study the dependence of RL on the cost function. In experiment (1), subjects sit in front of a drawing tablet and look at a screen onto which the drawing pen's position is projected. Beginning from a start point, their task is to move with the pen through a target point presented on screen. Visual feedback about the pen's position is given only before movement onset. At the end of a movement, subjects get visual feedback only about the cost of this trial. We choose as cost the squared distance between target and virtual pen position at the target line. Above a threshold value, the cost was fixed at this value. In the mapping of the pen's position onto the screen, we added a bias (unknown to subject) and Gaussian noise. As result, subjects could learn the bias, and thus, showed reinforcement learning. In experiment (2), we randomly altered the target position between three different locations (three different directions from start point: -45, 0, 45). For each direction, we chose a different bias. As result, subjects learned all three bias values simultaneously. Thus, RL can be spatially localized. In experiment (3), we varied the sensitivity of the cost function by multiplying the squared distance with a constant value C, while keeping the same cut-off threshold. As in experiment (2), we had three target locations. We assigned to each location a different C value (this assignment was randomized between subjects). Since subjects learned the three locations simultaneously, we could directly compare the effect of the different cost functions. As result, we found an optimal C value; if C was too small (insensitive cost), learning was slow; if C was too large (narrow cost valley), the exploration time was longer and learning delayed. Thus, reinforcement learning in human motor control appears to be sen

[BibTex]

[BibTex]


no image
Movement generation by learning from demonstration and generalization to new targets

Pastor, P., Hoffmann, H., Schaal, S.

In Adaptive Motion of Animals and Machines (AMAM), 2008, clmc (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
Combining dynamic movement primitives and potential fields for online obstacle avoidance

Park, D., Hoffmann, H., Schaal, S.

In Adaptive Motion of Animals and Machines (AMAM), Cleveland, Ohio, 2008, 2008, clmc (inproceedings)

link (url) [BibTex]

link (url) [BibTex]


no image
Computational model for movement learning under uncertain cost

Theodorou, E., Hoffmann, H., Mistry, M., Schaal, S.

In Abstracts of the Society of Neuroscience Meeting (SFN 2008), Washington, DC 2008, 2008, clmc (inproceedings)

Abstract
Stochastic optimal control is a framework for computing control commands that lead to an optimal behavior under a given cost. Despite the long history of optimal control in engineering, it has been only recently applied to describe human motion. So far, stochastic optimal control has been mainly used in tasks that are already learned, such as reaching to a target. For learning, however, there are only few cases where optimal control has been applied. The main assumptions of stochastic optimal control that restrict its application to tasks after learning are the a priori knowledge of (1) a quadratic cost function (2) a state space model that captures the kinematics and/or dynamics of musculoskeletal system and (3) a measurement equation that models the proprioceptive and/or exteroceptive feedback. Under these assumptions, a sequence of control gains is computed that is optimal with respect to the prespecified cost function. In our work, we relax the assumption of the a priori known cost function and provide a computational framework for modeling tasks that involve learning. Typically, a cost function consists of two parts: one part that models the task constraints, like squared distance to goal at movement endpoint, and one part that integrates over the squared control commands. In learning a task, the first part of this cost function will be adapted. We use an expectation-maximization scheme for learning: the expectation step optimizes the task constraints through gradient descent of a reward function and the maximizing step optimizes the control commands. Our computational model is tested and compared with data given from a behavioral experiment. In this experiment, subjects sit in front of a drawing tablet and look at a screen onto which the drawing-pen's position is projected. Beginning from a start point, their task is to move with the pen through a target point presented on screen. Visual feedback about the pen's position is given only before movement onset. At the end of a movement, subjects get visual feedback only about the cost of this trial. In the mapping of the pen's position onto the screen, we added a bias (unknown to subject) and Gaussian noise. Therefore the cost is a function of this bias. The subjects were asked to reach to the target and minimize this cost over trials. In this behavioral experiment, subjects could learn the bias and thus showed reinforcement learning. With our computational model, we could model the learning process over trials. Particularly, the dependence on parameters of the reward function (Gaussian width) and the modulation of movement variance over time were similar in experiment and model.

[BibTex]

[BibTex]


no image
A Bayesian approach to empirical local linearizations for robotics

Ting, J., D’Souza, A., Vijayakumar, S., Schaal, S.

In International Conference on Robotics and Automation (ICRA2008), Pasadena, CA, USA, May 19-23, 2008, 2008, clmc (inproceedings)

Abstract
Local linearizations are ubiquitous in the control of robotic systems. Analytical methods, if available, can be used to obtain the linearization, but in complex robotics systems where the the dynamics and kinematics are often not faithfully obtainable, empirical linearization may be preferable. In this case, it is important to only use data for the local linearization that lies within a ``reasonable'' linear regime of the system, which can be defined from the Hessian at the point of the linearization -- a quantity that is not available without an analytical model. We introduce a Bayesian approach to solve statistically what constitutes a ``reasonable'' local regime. We approach this problem in the context local linear regression. In contrast to previous locally linear methods, we avoid cross-validation or complex statistical hypothesis testing techniques to find the appropriate local regime. Instead, we treat the parameters of the local regime probabilistically and use approximate Bayesian inference for their estimation. This approach results in an analytical set of iterative update equations that are easily implemented on real robotics systems for real-time applications. As in other locally weighted regressions, our algorithm also lends itself to complete nonlinear function approximation for learning empirical internal models. We sketch the derivation of our Bayesian method and provide evaluations on synthetic data and actual robot data where the analytical linearization was known.

link (url) [BibTex]

link (url) [BibTex]


no image
Do humans plan continuous trajectories in kinematic coordinates?

Hoffmann, H., Schaal, S.

In Abstracts of the Society of Neuroscience Meeting (SFN 2008), Washington, DC 2008, 2008, clmc (inproceedings)

Abstract
The planning and execution of human arm movements is still unresolved. An ongoing controversy is whether we plan a movement in kinematic coordinates and convert these coordinates with an inverse internal model into motor commands (like muscle activation) or whether we combine a few muscle synergies or equilibrium points to move a hand, e.g., between two targets. The first hypothesis implies that a planner produces a desired end-effector position for all time points; the second relies on the dynamics of the muscular-skeletal system for a given control command to produce a continuous end-effector trajectory. To distinguish between these two possibilities, we use a visuomotor adaptation experiment. Subjects moved a pen on a graphics tablet and observed the pen's mapped position onto a screen (subjects quickly adapted to this mapping). The task was to move a cursor between two points in a given time window. In the adaptation test, we manipulated the velocity profile of the cursor feedback such that the shape of the trajectories remained unchanged (for straight paths). If humans would use a kinematic plan and map at each time the desired end-effector position onto control commands, subjects should adapt to the above manipulation. In a similar experiment, Wolpert et al (1995) showed adaptation to changes in the curvature of trajectories. This result, however, cannot rule out a shift of an equilibrium point or an additional synergy activation between start and end point of a movement. In our experiment, subjects did two sessions, one control without and one with velocity-profile manipulation. To skew the velocity profile of the cursor trajectory, we added to the current velocity, v, the function 0.8*v*cos(pi + pi*x), where x is the projection of the cursor position onto the start-goal line divided by the distance start to goal (x=0 at the start point). As result, subjects did not adapt to this manipulation: for all subjects, the true hand motion was not significantly modified in a direction consistent with adaptation, despite that the visually presented motion differed significantly from the control motion. One may still argue that this difference in motion was insufficient to be processed visually. Thus, as a control experiment, we replayed control and modified motions to the subjects and asked which of the two motions appeared 'more natural'. Subjects chose the unperturbed motion as more natural significantly better than chance. In summary, for a visuomotor transformation task, the hypothesis of a planned continuous end-effector trajectory predicts adaptation to a modified velocity profile. The current experiment found no adaptation under such transformation.

[BibTex]

[BibTex]

2005


no image
Natural Actor-Critic

Peters, J., Vijayakumar, S., Schaal, S.

In Proceedings of the 16th European Conference on Machine Learning, 3720, pages: 280-291, (Editors: Gama, J.;Camacho, R.;Brazdil, P.;Jorge, A.;Torgo, L.), Springer, ECML, 2005, clmc (inproceedings)

Abstract
This paper investigates a novel model-free reinforcement learning architecture, the Natural Actor-Critic. The actor updates are based on stochastic policy gradients employing AmariÕs natural gradient approach, while the critic obtains both the natural policy gradient and additional parameters of a value function simultaneously by linear regres- sion. We show that actor improvements with natural policy gradients are particularly appealing as these are independent of coordinate frame of the chosen policy representation, and can be estimated more efficiently than regular policy gradients. The critic makes use of a special basis function parameterization motivated by the policy-gradient compatible function approximation. We show that several well-known reinforcement learning methods such as the original Actor-Critic and BradtkeÕs Linear Quadratic Q-Learning are in fact Natural Actor-Critic algorithms. Em- pirical evaluations illustrate the effectiveness of our techniques in com- parison to previous methods, and also demonstrate their applicability for learning control on an anthropomorphic robot arm.

link (url) DOI [BibTex]

2005

link (url) DOI [BibTex]


no image
Comparative experiments on task space control with redundancy resolution

Nakanishi, J., Cory, R., Mistry, M., Peters, J., Schaal, S.

In Proceedings of the 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages: 3901-3908, Edmonton, Alberta, Canada, Aug. 2-6, IROS, 2005, clmc (inproceedings)

Abstract
Understanding the principles of motor coordination with redundant degrees of freedom still remains a challenging problem, particularly for new research in highly redundant robots like humanoids. Even after more than a decade of research, task space control with redundacy resolution still remains an incompletely understood theoretical topic, and also lacks a larger body of thorough experimental investigation on complex robotic systems. This paper presents our first steps towards the development of a working redundancy resolution algorithm which is robust against modeling errors and unforeseen disturbances arising from contact forces. To gain a better understanding of the pros and cons of different approaches to redundancy resolution, we focus on a comparative empirical evaluation. First, we review several redundancy resolution schemes at the velocity, acceleration and torque levels presented in the literature in a common notational framework and also introduce some new variants of these previous approaches. Second, we present experimental comparisons of these approaches on a seven-degree-of-freedom anthropomorphic robot arm. Surprisingly, one of our simplest algorithms empirically demonstrates the best performance, despite, from a theoretical point, the algorithm does not share the same beauty as some of the other methods. Finally, we discuss practical properties of these control algorithms, particularly in light of inevitable modeling errors of the robot dynamics.

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Predicting EMG Data from M1 Neurons with Variational Bayesian Least Squares

Ting, J., D’Souza, A., Yamamoto, K., Yoshioka, T., Hoffman, D., Kakei, S., Sergio, L., Kalaska, J., Kawato, M., Strick, P., Schaal, S.

In Advances in Neural Information Processing Systems 18 (NIPS 2005), (Editors: Weiss, Y.;Schölkopf, B.;Platt, J.), Cambridge, MA: MIT Press, Vancouver, BC, Dec. 6-11, 2005, clmc (inproceedings)

Abstract
An increasing number of projects in neuroscience requires the statistical analysis of high dimensional data sets, as, for instance, in predicting behavior from neural firing, or in operating artificial devices from brain recordings in brain-machine interfaces. Linear analysis techniques remain prevalent in such cases, but classi-cal linear regression approaches are often numercially too fragile in high dimen-sions. In this paper, we address the question of whether EMG data collected from arm movements of monkeys can be faithfully reconstructed with linear ap-proaches from neural activity in primary motor cortex (M1). To achieve robust data analysis, we develop a full Bayesian approach to linear regression that automatically detects and excludes irrelevant features in the data, and regular-izes against overfitting. In comparison with ordinary least squares, stepwise re-gression, partial least squares, and a brute force combinatorial search for the most predictive input features in the data, we demonstrate that the new Bayesian method offers a superior mixture of characteristics in terms of regularization against overfitting, computational efficiency, and ease of use, demonstrating its potential as a drop-in replacement for other linear regression techniques. As neuroscientific results, our analyses demonstrate that EMG data can be well pre-dicted from M1 neurons, further opening the path for possible real-time inter-faces between brains and machines.

link (url) [BibTex]

link (url) [BibTex]


no image
Rapbid synchronization and accurate phase-locking of rhythmic motor primitives

Pongas, D., Billard, A., Schaal, S.

In IEEE International Conference on Intelligent Robots and Systems (IROS 2005), pages: 2911-2916, Edmonton, Alberta, Canada, Aug. 2-6, 2005, clmc (inproceedings)

Abstract
Rhythmic movement is ubiquitous in human and animal behavior, e.g., as in locomotion, dancing, swimming, chewing, scratching, music playing, etc. A particular feature of rhythmic movement in biology is the rapid synchronization and phase locking with other rhythmic events in the environment, for instance music or visual stimuli as in ball juggling. In traditional oscillator theories to rhythmic movement generation, synchronization with another signal is relatively slow, and it is not easy to achieve accurate phase locking with a particular feature of the driving stimulus. Using a recently developed framework of dynamic motor primitives, we demonstrate a novel algorithm for very rapid synchronizaton of a rhythmic movement pattern, which can phase lock any feature of the movement to any particulur event in the driving stimulus. As an example application, we demonstrate how an anthropomorphic robot can use imitation learning to acquire a complex rumming pattern and keep it synchronized with an external rhythm generator that changes its frequency over time.

link (url) [BibTex]

link (url) [BibTex]


no image
A new methodology for robot control design

Peters, J., Mistry, M., Udwadia, F. E., Schaal, S.

In The 5th ASME International Conference on Multibody Systems, Nonlinear Dynamics, and Control (MSNDC 2005), Long Beach, CA, Sept. 24-28, 2005, clmc (inproceedings)

Abstract
Gauss principle of least constraint and its generalizations have provided a useful insights for the development of tracking controllers for mechanical systems (Udwadia,2003). Using this concept, we present a novel methodology for the design of a specific class of robot controllers. With our new framework, we demonstrate that well-known and also several novel nonlinear robot control laws can be derived from this generic framework, and show experimental verifications on a Sarcos Master Arm robot for some of these controllers. We believe that the suggested approach unifies and simplifies the design of optimal nonlinear control laws for robots obeying rigid body dynamics equations, both with or without external constraints, holonomic or nonholonomic constraints, with over-actuation or underactuation, as well as open-chain and closed-chain kinematics.

link (url) [BibTex]

link (url) [BibTex]


no image
Arm movement experiments with joint space force fields using an exoskeleton robot

Mistry, M., Mohajerian, P., Schaal, S.

In IEEE Ninth International Conference on Rehabilitation Robotics, pages: 408-413, Chicago, Illinois, June 28-July 1, 2005, clmc (inproceedings)

Abstract
A new experimental platform permits us to study a novel variety of issues of human motor control, particularly full 3-D movements involving the major seven degrees-of-freedom (DOF) of the human arm. We incorporate a seven DOF robot exoskeleton, and can minimize weight and inertia through gravity, Coriolis, and inertia compensation, such that subjects' arm movements are largely unaffected by the manipulandum. Torque perturbations can be individually applied to any or all seven joints of the human arm, thus creating novel dynamic environments, or force fields, for subjects to respond and adapt to. Our first study investigates a joint space force field where the shoulder velocity drives a disturbing force in the elbow joint. Results demonstrate that subjects learn to compensate for the force field within about 100 trials, and from the strong presence of aftereffects when removing the field in some randomized catch trials, that an inverse dynamics, or internal model, of the force field is formed by the nervous system. Interestingly, while post-learning hand trajectories return to baseline, joint space trajectories remained changed in response to the field, indicating that besides learning a model of the force field, the nervous system also chose to exploit the space to minimize the effects of the force field on the realization of the endpoint trajectory plan. Further applications for our apparatus include studies in motor system redundancy resolution and inverse kinematics, as well as rehabilitation.

link (url) [BibTex]

link (url) [BibTex]


no image
A unifying framework for the control of robotics systems

Peters, J., Mistry, M., Udwadia, F. E., Cory, R., Nakanishi, J., Schaal, S.

In IEEE International Conference on Intelligent Robots and Systems (IROS 2005), pages: 1824-1831, Edmonton, Alberta, Canada, Aug. 2-6, 2005, clmc (inproceedings)

Abstract
Recently, [1] suggested to derive tracking controllers for mechanical systems using a generalization of GaussÕ principle of least constraint. This method al-lows us to reformulate control problems as a special class of optimal control. We take this line of reasoning one step further and demonstrate that well-known and also several novel nonlinear robot control laws can be derived from this generic methodology. We show experimental verifications on a Sar-cos Master Arm robot for some of the the derived controllers.We believe that the suggested approach offers a promising unification and simplification of nonlinear control law design for robots obeying rigid body dynamics equa-tions, both with or without external constraints, with over-actuation or under-actuation, as well as open-chain and closed-chain kinematics.

link (url) [BibTex]

link (url) [BibTex]

1994


no image
Robot learning by nonparametric regression

Schaal, S., Atkeson, C. G.

In Proceedings of the International Conference on Intelligent Robots and Systems (IROS’94), pages: 478-485, Munich Germany, 1994, clmc (inproceedings)

Abstract
We present an approach to robot learning grounded on a nonparametric regression technique, locally weighted regression. The model of the task to be performed is represented by infinitely many local linear models, i.e., the (hyper-) tangent planes at every query point. Such a model, however, is only generated when a query is performed and is not retained. This is in contrast to other methods using a finite set of linear models to accomplish a piecewise linear model. Architectural parameters of our approach, such as distance metrics, are also a function of the current query point instead of being global. Statistical tests are presented for when a local model is good enough such that it can be reliably used to build a local controller. These statistical measures also direct the exploration of the robot. We explicitly deal with the case where prediction accuracy requirements exist during exploration: By gradually shifting a center of exploration and controlling the speed of the shift with local prediction accuracy, a goal-directed exploration of state space takes place along the fringes of the current data support until the task goal is achieved. We illustrate this approach by describing how it has been used to enable a robot to learn a challenging juggling task: Within 40 to 100 trials the robot accomplished the task goal starting out with no initial experiences.

[BibTex]

1994

[BibTex]


no image
Assessing the quality of learned local models

Schaal, S., Atkeson, C. G.

In Advances in Neural Information Processing Systems 6, pages: 160-167, (Editors: Cowan, J.;Tesauro, G.;Alspector, J.), Morgan Kaufmann, San Mateo, CA, 1994, clmc (inproceedings)

Abstract
An approach is presented to learning high dimensional functions in the case where the learning algorithm can affect the generation of new data. A local modeling algorithm, locally weighted regression, is used to represent the learned function. Architectural parameters of the approach, such as distance metrics, are also localized and become a function of the query point instead of being global. Statistical tests are given for when a local model is good enough and sampling should be moved to a new area. Our methods explicitly deal with the case where prediction accuracy requirements exist during exploration: By gradually shifting a "center of exploration" and controlling the speed of the shift with local prediction accuracy, a goal-directed exploration of state space takes place along the fringes of the current data support until the task goal is achieved. We illustrate this approach with simulation results and results from a real robot learning a complex juggling task.

link (url) [BibTex]

link (url) [BibTex]


no image
Memory-based robot learning

Schaal, S., Atkeson, C. G.

In IEEE International Conference on Robotics and Automation, 3, pages: 2928-2933, San Diego, CA, 1994, clmc (inproceedings)

Abstract
We present a memory-based local modeling approach to robot learning using a nonparametric regression technique, locally weighted regression. The model of the task to be performed is represented by infinitely many local linear models, the (hyper-) tangent planes at every query point. This is in contrast to other methods using a finite set of linear models to accomplish a piece-wise linear model. Architectural parameters of our approach, such as distance metrics, are a function of the current query point instead of being global. Statistical tests are presented for when a local model is good enough such that it can be reliably used to build a local controller. These statistical measures also direct the exploration of the robot. We explicitly deal with the case where prediction accuracy requirements exist during exploration: By gradually shifting a center of exploration and controlling the speed of the shift with local prediction accuracy, a goal-directed exploration of state space takes place along the fringes of the current data support until the task goal is achieved. We illustrate this approach by describing how it has been used to enable a robot to learn a challenging juggling task: within 40 to 100 trials the robot accomplished the task goal starting out with no initial experiences.

[BibTex]

[BibTex]


no image
Nonparametric regression for learning

Schaal, S.

In Conference on Adaptive Behavior and Learning, Center of Interdisciplinary Research (ZIF) Bielefeld Germany, also technical report TR-H-098 of the ATR Human Information Processing Research Laboratories, 1994, clmc (inproceedings)

Abstract
In recent years, learning theory has been increasingly influenced by the fact that many learning algorithms have at least in part a comprehensive interpretation in terms of well established statistical theories. Furthermore, with little modification, several statistical methods can be directly cast into learning algorithms. One family of such methods stems from nonparametric regression. This paper compares nonparametric learning with the more widely used parametric counterparts and investigates how these two families differ in their properties and their applicability. 

link (url) [BibTex]

link (url) [BibTex]