Header logo is am


2007


no image
Using reward-weighted regression for reinforcement learning of task space control

Peters, J., Schaal, S.

In Proceedings of the 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, pages: 262-267, Honolulu, Hawaii, April 1-5, 2007, 2007, clmc (inproceedings)

Abstract
In this paper, we evaluate different versions from the three main kinds of model-free policy gradient methods, i.e., finite difference gradients, `vanilla' policy gradients and natural policy gradients. Each of these methods is first presented in its simple form and subsequently refined and optimized. By carrying out numerous experiments on the cart pole regulator benchmark we aim to provide a useful baseline for future research on parameterized policy search algorithms. Portable C++ code is provided for both plant and algorithms; thus, the results in this paper can be reevaluated, reused and new algorithms can be inserted with ease.

link (url) DOI [BibTex]

2007

link (url) DOI [BibTex]


no image
Evaluation of Policy Gradient Methods and Variants on the Cart-Pole Benchmark

Riedmiller, M., Peters, J., Schaal, S.

In Proceedings of the 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, pages: 254-261, ADPRL, 2007, clmc (inproceedings)

Abstract
In this paper, we evaluate different versions from the three main kinds of model-free policy gradient methods, i.e., finite difference gradients, `vanilla' policy gradients and natural policy gradients. Each of these methods is first presented in its simple form and subsequently refined and optimized. By carrying out numerous experiments on the cart pole regulator benchmark we aim to provide a useful baseline for future research on parameterized policy search algorithms. Portable C++ code is provided for both plant and algorithms; thus, the results in this paper can be reevaluated, reused and new algorithms can be inserted with ease.

PDF [BibTex]

PDF [BibTex]


no image
Uncertain 3D Force Fields in Reaching Movements: Do Humans Favor Robust or Average Performance?

Mistry, M., Theodorou, E., Hoffmann, H., Schaal, S.

In Abstracts of the 37th Meeting of the Society of Neuroscience, 2007, clmc (inproceedings)

PDF [BibTex]

PDF [BibTex]


no image
Applying the episodic natural actor-critic architecture to motor primitive learning

Peters, J., Schaal, S.

In Proceedings of the 2007 European Symposium on Artificial Neural Networks (ESANN), Bruges, Belgium, April 25-27, 2007, clmc (inproceedings)

Abstract
In this paper, we investigate motor primitive learning with the Natural Actor-Critic approach. The Natural Actor-Critic consists out of actor updates which are achieved using natural stochastic policy gradients while the critic obtains the natural policy gradient by linear regression. We show that this architecture can be used to learn the Òbuilding blocks of movement generationÓ, called motor primitives. Motor primitives are parameterized control policies such as splines or nonlinear differential equations with desired attractor properties. We show that our most modern algorithm, the Episodic Natural Actor-Critic outperforms previous algorithms by at least an order of magnitude. We demonstrate the efficiency of this reinforcement learning method in the application of learning to hit a baseball with an anthropomorphic robot arm.

link (url) [BibTex]

link (url) [BibTex]


no image
A computational model of human trajectory planning based on convergent flow fields

Hoffman, H., Schaal, S.

In Abstracts of the 37st Meeting of the Society of Neuroscience, San Diego, CA, Nov. 3-7, 2007, clmc (inproceedings)

Abstract
A popular computational model suggests that smooth reaching movements are generated in humans by minimizing a difference vector between hand and target in visual coordinates (Shadmehr and Wise, 2005). To achieve such a task, the optimal joint accelerations may be pre-computed. However, this pre-planning is inflexible towards perturbations of the limb, and there is strong evidence that reaching movements can be modified on-line at any moment during the movement. Thus, next-state planning models (Bullock and Grossberg, 1988) have been suggested that compute the current control command from a function of the goal state such that the overall movement smoothly converges to the goal (see Shadmehr and Wise (2005) for an overview). So far, these models have been restricted to simple point-to-point reaching movements with (approximately) straight trajectories. Here, we present a computational model for learning and executing arbitrary trajectories that combines ideas from pattern generation with dynamic systems and the observation of convergent force fields, which control a frog leg after spinal stimulation (Giszter et al., 1993). In our model, we incorporate the following two observations: first, the orientation of vectors in a force field is invariant over time, but their amplitude is modulated by a time-varying function, and second, two force fields add up when stimulated simultaneously (Giszter et al., 1993). This addition of convergent force fields varying over time results in a virtual trajectory (a moving equilibrium point) that correlates with the actual leg movement (Giszter et al., 1993). Our next-state planner is a set of differential equations that provide the desired end-effector or joint accelerations using feedback of the current state of the limb. These accelerations can be interpreted as resulting from a damped spring that links the current limb position with a virtual trajectory. This virtual trajectory can be learned to realize any desired limb trajectory and velocity profile, and learning is efficient since the time-modulated sum of convergent force fields equals a sum of weighted basis functions (Gaussian time pulses). Thus, linear algebra is sufficient to compute these weights, which correspond to points on the virtual trajectory. During movement execution, the differential equation corrects automatically for perturbations and brings back smoothly the limb towards the goal. Virtual trajectories can be rescaled and added allowing to build a set of movement primitives to describe movements more complex than previously learned. We demonstrate the potential of the suggested model by learning and generating a wide variety of movements.

[BibTex]

[BibTex]


no image
A Computational Model of Arm Trajectory Modification Using Dynamic Movement Primitives

Mohajerian, P., Hoffmann, H., Mistry, M., Schaal, S.

In Abstracts of the 37st Meeting of the Society of Neuroscience, San Diego, CA, Nov 3-7, 2007, clmc (inproceedings)

Abstract
Several scientists used a double-step target-displacement protocol to investigate how an unexpected upcoming new target modifies ongoing discrete movements. Interesting observations are the initial direction of the movement, the spatial path of the movement to the second target, and the amplification of the speed in the second movement. Experimental data show that the above properties are influenced by the movement reaction time and the interstimulus interval between the onset of the first and second target. Hypotheses in the literature concerning the interpretation of the observed data include a) the second movement is superimposed on the first movement (Henis and Flash, 1995), b) the first movement is aborted and the second movement is planned to smoothly connect the current state of the arm with the new target (Hoff and Arbib, 1992), c) the second movement is initiated by a new control signal that replaces the first movement's control signal, but does not take the state of the system into account (Flanagan et al., 1993), and (d) the second movement is initiated by a new goal command, but the control structure stays unchanged, and feed-back from the current state is taken into account (Hoff and Arbib, 1993). We investigate target switching from the viewpoint of Dynamic Movement Primitives (DMPs). DMPs are trajectory planning units that are formalized as stable nonlinear attractor systems (Ijspeert et al., 2002). They are a useful framework for biological motor control as they are highly flexible in creating complex rhythmic and discrete behaviors that can quickly adapt to the inevitable perturbations of dynamically changing, stochastic environments. In this model, target switching is accomplished simply by updating the target input to the discrete movement primitive for reaching. The reaching trajectory in this model can be straight or take any other route; in contrast, the Hoff and Arbib (1993) model is restricted to straight reaching movement plans. In the present study, we use DMPs to reproduce in simulation a large number of target-switching experimental data from the literature and to show that online correction and the observed target switching phenomena can be accomplished by changing the goal state of an on-going DMP, without the need to switch to different movement primitives or to re-plan the movement. :

PDF [BibTex]

PDF [BibTex]


no image
Inverse dynamics control with floating base and constraints

Nakanishi, J., Mistry, M., Schaal, S.

In International Conference on Robotics and Automation (ICRA2007), pages: 1942-1947, Rome, Italy, April 10-14, 2007, clmc (inproceedings)

Abstract
In this paper, we address the issues of compliant control of a robot under contact constraints with a goal of using joint space based pattern generators as movement primitives, as often considered in the studies of legged locomotion and biological motor control. For this purpose, we explore inverse dynamics control of constrained dynamical systems. When the system is overconstrained, it is not straightforward to formulate an inverse dynamics control law since the problem becomes an ill-posed one, where infinitely many combinations of joint torques are possible to achieve the desired joint accelerations. The goal of this paper is to develop a general and computationally efficient inverse dynamics algorithm for a robot with a free floating base and constraints. We suggest an approximate way of computing inverse dynamics algorithm by treating constraint forces computed with a Lagrange multiplier method as simply external forces based on FeatherstoneÕs floating base formulation of inverse dynamics. We present how all the necessary quantities to compute our controller can be efficiently extracted from FeatherstoneÕs spatial notation of robot dynamics. We evaluate the effectiveness of the suggested approach on a simulated biped robot model.

link (url) [BibTex]

link (url) [BibTex]


no image
Dynamics systems vs. optimal control ? a unifying view

Schaal, S, Mohajerian, P., Ijspeert, A.

In Progress in Brain Research, (165):425-445, 2007, clmc (inbook)

Abstract
In the past, computational motor control has been approached from at least two major frameworks: the dynamic systems approach and the viewpoint of optimal control. The dynamic system approach emphasizes motor control as a process of self-organization between an animal and its environment. Nonlinear differential equations that can model entrainment and synchronization behavior are among the most favorable tools of dynamic systems modelers. In contrast, optimal control approaches view motor control as the evolutionary or development result of a nervous system that tries to optimize rather general organizational principles, e.g., energy consumption or accurate task achievement. Optimal control theory is usually employed to develop appropriate theories. Interestingly, there is rather little interaction between dynamic systems and optimal control modelers as the two approaches follow rather different philosophies and are often viewed as diametrically opposing. In this paper, we develop a computational approach to motor control that offers a unifying modeling framework for both dynamic systems and optimal control approaches. In discussions of several behavioral experiments and some theoretical and robotics studies, we demonstrate how our computational ideas allow both the representation of self-organizing processes and the optimization of movement based on reward criteria. Our modeling framework is rather simple and general, and opens opportunities to revisit many previous modeling results from this novel unifying view.

link (url) [BibTex]

link (url) [BibTex]


no image
Kernel carpentry for onlne regression using randomly varying coefficient model

Edakunni, N. U., Schaal, S., Vijayakumar, S.

In Proceedings of the 20th International Joint Conference on Artificial Intelligence, Hyderabad, India: Jan. 6-12, 2007, clmc (inproceedings)

Abstract
We present a Bayesian formulation of locally weighted learning (LWL) using the novel concept of a randomly varying coefficient model. Based on this, we propose a mechanism for multivariate non-linear regression using spatially localised linear models that learns completely independent of each other, uses only local information and adapts the local model complexity in a data driven fashion. We derive online updates for the model parameters based on variational Bayesian EM. The evaluation of the proposed algorithm against other state-of-the-art methods reveal the excellent, robust generalization performance beside surprisingly efficient time and space complexity properties. This paper, for the first time, brings together the computational efficiency and the adaptability of Õnon-competitiveÕ locally weighted learning schemes and the modeling guarantees of the Bayesian formulation.

link (url) [BibTex]

link (url) [BibTex]


no image
A robust quadruped walking gait for traversing rough terrain

Pongas, D., Mistry, M., Schaal, S.

In International Conference on Robotics and Automation (ICRA2007), pages: 1474-1479, Rome, April 10-14, 2007, 2007, clmc (inproceedings)

Abstract
Legged locomotion excels when terrains become too rough for wheeled systems or open-loop walking pattern generators to succeed, i.e., when accurate foot placement is of primary importance in successfully reaching the task goal. In this paper we address the scenario where the rough terrain is traversed with a static walking gait, and where for every foot placement of a leg, the location of the foot placement was selected irregularly by a planning algorithm. Our goal is to adjust a smooth walking pattern generator with the selection of every foot placement such that the COG of the robot follows a stable trajectory characterized by a stability margin relative to the current support triangle. We propose a novel parameterization of the COG trajectory based on the current position, velocity, and acceleration of the four legs of the robot. This COG trajectory has guaranteed continuous velocity and acceleration profiles, which leads to continuous velocity and acceleration profiles of the leg movement, which is ideally suited for advanced model-based controllers. Pitch, yaw, and ground clearance of the robot are easily adjusted automatically under any terrain situation. We evaluate our gait generation technique on the Little-Dog quadruped robot when traversing complex rocky and sloped terrains.

link (url) [BibTex]

link (url) [BibTex]


no image
Bayesian Nonparametric Regression with Local Models

Ting, J., Schaal, S.

In Workshop on Robotic Challenges for Machine Learning, NIPS 2007, 2007, clmc (inproceedings)

[BibTex]

[BibTex]


no image
Task space control with prioritization for balance and locomotion

Mistry, M., Nakanishi, J., Schaal, S.

In IEEE International Conference on Intelligent Robotics Systems (IROS 2007), San Diego, CA: Oct. 29 Ð Nov. 2, 2007, clmc (inproceedings)

Abstract
This paper addresses locomotion with active balancing, via task space control with prioritization. The center of gravity (COG) and foot of the swing leg are treated as task space control points. Floating base inverse kinematics with constraints is employed, thereby allowing for a mobile platform suitable for locomotion. Different techniques of task prioritization are discussed and we clarify differences and similarities of previous suggested work. Varying levels of prioritization for control are examined with emphasis on singularity robustness and the negative effects of constraint switching. A novel controller for task space control of balance and locomotion is developed which attempts to address singularity robustness, while minimizing discontinuities created by constraint switching. Controllers are evaluated using a quadruped robot simulator engaging in a locomotion task.

link (url) [BibTex]

link (url) [BibTex]

2001


no image
Humanoid oculomotor control based on concepts of computational neuroscience

Shibata, T., Vijayakumar, S., Conradt, J., Schaal, S.

In Humanoids2001, Second IEEE-RAS International Conference on Humanoid Robots, 2001, clmc (inproceedings)

Abstract
Oculomotor control in a humanoid robot faces similar problems as biological oculomotor systems, i.e., the stabilization of gaze in face of unknown perturbations of the body, selective attention, the complexity of stereo vision and dealing with large information processing delays. In this paper, we suggest control circuits to realize three of the most basic oculomotor behaviors - the vestibulo-ocular and optokinetic reflex (VOR-OKR) for gaze stabilization, smooth pursuit for tracking moving objects, and saccades for overt visual attention. Each of these behaviors was derived from inspirations from computational neuroscience, which proves to be a viable strategy to explore novel control mechanisms for humanoid robotics. Our implementations on a humanoid robot demonstrate good performance of the oculomotor behaviors that appears natural and human-like.

link (url) [BibTex]

2001

link (url) [BibTex]


no image
Trajectory formation for imitation with nonlinear dynamical systems

Ijspeert, A., Nakanishi, J., Schaal, S.

In IEEE International Conference on Intelligent Robots and Systems (IROS 2001), pages: 752-757, Weilea, Hawaii, Oct.29-Nov.3, 2001, clmc (inproceedings)

Abstract
This article explores a new approach to learning by imitation and trajectory formation by representing movements as mixtures of nonlinear differential equations with well-defined attractor dynamics. An observed movement is approximated by finding a best fit of the mixture model to its data by a recursive least squares regression technique. In contrast to non-autonomous movement representations like splines, the resultant movement plan remains an autonomous set of nonlinear differential equations that forms a control policy which is robust to strong external perturbations and that can be modified by additional perceptual variables. This movement policy remains the same for a given target, regardless of the initial conditions, and can easily be re-used for new targets. We evaluate the trajectory formation system (TFS) in the context of a humanoid robot simulation that is part of the Virtual Trainer (VT) project, which aims at supervising rehabilitation exercises in stroke-patients. A typical rehabilitation exercise was collected with a Sarcos Sensuit, a device to record joint angular movement from human subjects, and approximated and reproduced with our imitation techniques. Our results demonstrate that multi-joint human movements can be encoded successfully, and that this system allows robust modifications of the movement policy through external variables.

link (url) [BibTex]

link (url) [BibTex]


no image
Real-time statistical learning for robotics and human augmentation

Schaal, S., Vijayakumar, S., D’Souza, A., Ijspeert, A., Nakanishi, J.

In International Symposium on Robotics Research, (Editors: Jarvis, R. A.;Zelinsky, A.), Lorne, Victoria, Austrialia Nov.9-12, 2001, clmc (inproceedings)

Abstract
Real-time modeling of complex nonlinear dynamic processes has become increasingly important in various areas of robotics and human augmentation. To address such problems, we have been developing special statistical learning methods that meet the demands of on-line learning, in particular the need for low computational complexity, rapid learning, and scalability to high-dimensional spaces. In this paper, we introduce a novel algorithm that possesses all the necessary properties by combining methods from probabilistic and nonparametric learning. We demonstrate the applicability of our methods for three different applications in humanoid robotics, i.e., the on-line learning of a full-body inverse dynamics model, an inverse kinematics model, and imitation learning. The latter application will also introduce a novel method to shape attractor landscapes of dynamical system by means of statis-tical learning.

link (url) [BibTex]

link (url) [BibTex]


no image
Robust learning of arm trajectories through human demonstration

Billard, A., Schaal, S.

In IEEE International Conference on Intelligent Robots and Systems (IROS 2001), Piscataway, NJ: IEEE, Maui, Hawaii, Oct.29-Nov.3, 2001, clmc (inproceedings)

Abstract
We present a model, composed of hierarchy of artificial neural networks, for robot learning by demonstration. The model is implemented in a dynamic simulation of a 41 degrees of freedom humanoid for reproducing 3D human motion of the arm. Results show that the model requires few information about the desired trajectory and learns on-line the relevant features of movement. It can generalize across a small set of data to produce a qualitatively good reproduction of the demonstrated trajectory. Finally, it is shown that reproduction of the trajectory after learning is robust against perturbations.

link (url) [BibTex]

link (url) [BibTex]


no image
Overt visual attention for a humanoid robot

Vijayakumar, S., Conradt, J., Shibata, T., Schaal, S.

In IEEE International Conference on Intelligent Robots and Systems (IROS 2001), 2001, clmc (inproceedings)

Abstract
The goal of our research is to investigate the interplay between oculomotor control, visual processing, and limb control in humans and primates by exploring the computational issues of these processes with a biologically inspired artificial oculomotor system on an anthropomorphic robot. In this paper, we investigate the computational mechanisms for visual attention in such a system. Stimuli in the environment excite a dynamical neural network that implements a saliency map, i.e., a winner-take-all competition between stimuli while simultenously smoothing out noise and suppressing irrelevant inputs. In real-time, this system computes new targets for the shift of gaze, executed by the head-eye system of the robot. The redundant degrees-of- freedom of the head-eye system are resolved through a learned inverse kinematics with optimization criterion. We also address important issues how to ensure that the coordinate system of the saliency map remains correct after movement of the robot. The presented attention system is built on principled modules and generally applicable for any sensory modality.

link (url) [BibTex]

link (url) [BibTex]


no image
Learning inverse kinematics

D’Souza, A., Vijayakumar, S., Schaal, S.

In IEEE International Conference on Intelligent Robots and Systems (IROS 2001), Piscataway, NJ: IEEE, Maui, Hawaii, Oct.29-Nov.3, 2001, clmc (inproceedings)

Abstract
Real-time control of the endeffector of a humanoid robot in external coordinates requires computationally efficient solutions of the inverse kinematics problem. In this context, this paper investigates learning of inverse kinematics for resolved motion rate control (RMRC) employing an optimization criterion to resolve kinematic redundancies. Our learning approach is based on the key observations that learning an inverse of a non uniquely invertible function can be accomplished by augmenting the input representation to the inverse model and by using a spatially localized learning approach. We apply this strategy to inverse kinematics learning and demonstrate how a recently developed statistical learning algorithm, Locally Weighted Projection Regression, allows efficient learning of inverse kinematic mappings in an incremental fashion even when input spaces become rather high dimensional. The resulting performance of the inverse kinematics is comparable to Liegeois ([1]) analytical pseudo inverse with optimization. Our results are illustrated with a 30 degree-of-freedom humanoid robot.

link (url) [BibTex]

link (url) [BibTex]


no image
Biomimetic smooth pursuit based on fast learning of the target dynamics

Shibata, T., Schaal, S.

In IEEE International Conference on Intelligent Robots and Systems (IROS 2001), 2001, clmc (inproceedings)

Abstract
Following a moving target with a narrow-view foveal vision system is one of the essential oculomotor behaviors of humans and humanoids. This oculomotor behavior, called ``Smooth Pursuit'', requires accurate tracking control which cannot be achieved by a simple visual negative feedback controller due to the significant delays in visual information processing. In this paper, we present a biologically inspired and control theoretically sound smooth pursuit controller consisting of two cascaded subsystems. One is an inverse model controller for the oculomotor system, and the other is a learning controller for the dynamics of the visual target. The latter controller learns how to predict the target's motion in head coordinates such that tracking performance can be improved. We investigate our smooth pursuit system in simulations and experiments on a humanoid robot. By using a fast on-line statistical learning network, our humanoid oculomotor system is able to acquire high performance smooth pursuit after about 5 seconds of learning despite significant processing delays in the syste

link (url) [BibTex]

link (url) [BibTex]

1997


no image
Learning from demonstration

Schaal, S.

In Advances in Neural Information Processing Systems 9, pages: 1040-1046, (Editors: Mozer, M. C.;Jordan, M.;Petsche, T.), MIT Press, Cambridge, MA, 1997, clmc (inproceedings)

Abstract
By now it is widely accepted that learning a task from scratch, i.e., without any prior knowledge, is a daunting undertaking. Humans, however, rarely attempt to learn from scratch. They extract initial biases as well as strategies how to approach a learning problem from instructions and/or demonstrations of other humans. For learning control, this paper investigates how learning from demonstration can be applied in the context of reinforcement learning. We consider priming the Q-function, the value function, the policy, and the model of the task dynamics as possible areas where demonstrations can speed up learning. In general nonlinear learning problems, only model-based reinforcement learning shows significant speed-up after a demonstration, while in the special case of linear quadratic regulator (LQR) problems, all methods profit from the demonstration. In an implementation of pole balancing on a complex anthropomorphic robot arm, we demonstrate that, when facing the complexities of real signal processing, model-based reinforcement learning offers the most robustness for LQR problems. Using the suggested methods, the robot learns pole balancing in just a single trial after a 30 second long demonstration of the human instructor. 

link (url) [BibTex]

1997

link (url) [BibTex]


no image
Robot learning from demonstration

Atkeson, C. G., Schaal, S.

In Machine Learning: Proceedings of the Fourteenth International Conference (ICML ’97), pages: 12-20, (Editors: Fisher Jr., D. H.), Morgan Kaufmann, Nashville, TN, July 8-12, 1997, 1997, clmc (inproceedings)

Abstract
The goal of robot learning from demonstration is to have a robot learn from watching a demonstration of the task to be performed. In our approach to learning from demonstration the robot learns a reward function from the demonstration and a task model from repeated attempts to perform the task. A policy is computed based on the learned reward function and task model. Lessons learned from an implementation on an anthropomorphic robot arm using a pendulum swing up task include 1) simply mimicking demonstrated motions is not adequate to perform this task, 2) a task planner can use a learned model and reward function to compute an appropriate policy, 3) this model-based planning process supports rapid learning, 4) both parametric and nonparametric models can be learned and used, and 5) incorporating a task level direct learning component, which is non-model-based, in addition to the model-based planner, is useful in compensating for structural modeling errors and slow model learning. 

link (url) [BibTex]

link (url) [BibTex]


no image
Local dimensionality reduction for locally weighted learning

Vijayakumar, S., Schaal, S.

In International Conference on Computational Intelligence in Robotics and Automation, pages: 220-225, Monteray, CA, July10-11, 1997, 1997, clmc (inproceedings)

Abstract
Incremental learning of sensorimotor transformations in high dimensional spaces is one of the basic prerequisites for the success of autonomous robot devices as well as biological movement systems. So far, due to sparsity of data in high dimensional spaces, learning in such settings requires a significant amount of prior knowledge about the learning task, usually provided by a human expert. In this paper we suggest a partial revision of the view. Based on empirical studies, it can been observed that, despite being globally high dimensional and sparse, data distributions from physical movement systems are locally low dimensional and dense. Under this assumption, we derive a learning algorithm, Locally Adaptive Subspace Regression, that exploits this property by combining a local dimensionality reduction as a preprocessing step with a nonparametric learning technique, locally weighted regression. The usefulness of the algorithm and the validity of its assumptions are illustrated for a synthetic data set and data of the inverse dynamics of an actual 7 degree-of-freedom anthropomorphic robot arm.

link (url) [BibTex]

link (url) [BibTex]


no image
Learning tasks from a single demonstration

Atkeson, C. G., Schaal, S.

In IEEE International Conference on Robotics and Automation (ICRA97), 2, pages: 1706-1712, Piscataway, NJ: IEEE, Albuquerque, NM, 20-25 April, 1997, clmc (inproceedings)

Abstract
Learning a complex dynamic robot manoeuvre from a single human demonstration is difficult. This paper explores an approach to learning from demonstration based on learning an optimization criterion from the demonstration and a task model from repeated attempts to perform the task, and using the learned criterion and model to compute an appropriate robot movement. A preliminary version of the approach has been implemented on an anthropomorphic robot arm using a pendulum swing up task as an example

link (url) [BibTex]

link (url) [BibTex]

1994


no image
Robot learning by nonparametric regression

Schaal, S., Atkeson, C. G.

In Proceedings of the International Conference on Intelligent Robots and Systems (IROS’94), pages: 478-485, Munich Germany, 1994, clmc (inproceedings)

Abstract
We present an approach to robot learning grounded on a nonparametric regression technique, locally weighted regression. The model of the task to be performed is represented by infinitely many local linear models, i.e., the (hyper-) tangent planes at every query point. Such a model, however, is only generated when a query is performed and is not retained. This is in contrast to other methods using a finite set of linear models to accomplish a piecewise linear model. Architectural parameters of our approach, such as distance metrics, are also a function of the current query point instead of being global. Statistical tests are presented for when a local model is good enough such that it can be reliably used to build a local controller. These statistical measures also direct the exploration of the robot. We explicitly deal with the case where prediction accuracy requirements exist during exploration: By gradually shifting a center of exploration and controlling the speed of the shift with local prediction accuracy, a goal-directed exploration of state space takes place along the fringes of the current data support until the task goal is achieved. We illustrate this approach by describing how it has been used to enable a robot to learn a challenging juggling task: Within 40 to 100 trials the robot accomplished the task goal starting out with no initial experiences.

[BibTex]

1994

[BibTex]


no image
Assessing the quality of learned local models

Schaal, S., Atkeson, C. G.

In Advances in Neural Information Processing Systems 6, pages: 160-167, (Editors: Cowan, J.;Tesauro, G.;Alspector, J.), Morgan Kaufmann, San Mateo, CA, 1994, clmc (inproceedings)

Abstract
An approach is presented to learning high dimensional functions in the case where the learning algorithm can affect the generation of new data. A local modeling algorithm, locally weighted regression, is used to represent the learned function. Architectural parameters of the approach, such as distance metrics, are also localized and become a function of the query point instead of being global. Statistical tests are given for when a local model is good enough and sampling should be moved to a new area. Our methods explicitly deal with the case where prediction accuracy requirements exist during exploration: By gradually shifting a "center of exploration" and controlling the speed of the shift with local prediction accuracy, a goal-directed exploration of state space takes place along the fringes of the current data support until the task goal is achieved. We illustrate this approach with simulation results and results from a real robot learning a complex juggling task.

link (url) [BibTex]

link (url) [BibTex]


no image
Memory-based robot learning

Schaal, S., Atkeson, C. G.

In IEEE International Conference on Robotics and Automation, 3, pages: 2928-2933, San Diego, CA, 1994, clmc (inproceedings)

Abstract
We present a memory-based local modeling approach to robot learning using a nonparametric regression technique, locally weighted regression. The model of the task to be performed is represented by infinitely many local linear models, the (hyper-) tangent planes at every query point. This is in contrast to other methods using a finite set of linear models to accomplish a piece-wise linear model. Architectural parameters of our approach, such as distance metrics, are a function of the current query point instead of being global. Statistical tests are presented for when a local model is good enough such that it can be reliably used to build a local controller. These statistical measures also direct the exploration of the robot. We explicitly deal with the case where prediction accuracy requirements exist during exploration: By gradually shifting a center of exploration and controlling the speed of the shift with local prediction accuracy, a goal-directed exploration of state space takes place along the fringes of the current data support until the task goal is achieved. We illustrate this approach by describing how it has been used to enable a robot to learn a challenging juggling task: within 40 to 100 trials the robot accomplished the task goal starting out with no initial experiences.

[BibTex]

[BibTex]


no image
Nonparametric regression for learning

Schaal, S.

In Conference on Adaptive Behavior and Learning, Center of Interdisciplinary Research (ZIF) Bielefeld Germany, also technical report TR-H-098 of the ATR Human Information Processing Research Laboratories, 1994, clmc (inproceedings)

Abstract
In recent years, learning theory has been increasingly influenced by the fact that many learning algorithms have at least in part a comprehensive interpretation in terms of well established statistical theories. Furthermore, with little modification, several statistical methods can be directly cast into learning algorithms. One family of such methods stems from nonparametric regression. This paper compares nonparametric learning with the more widely used parametric counterparts and investigates how these two families differ in their properties and their applicability. 

link (url) [BibTex]

link (url) [BibTex]

1991


no image
Ways to smarter CAD-systems

Ehrlenspiel, K., Schaal, S.

In Proceedings of ICED’91Heurista, pages: 10-16, (Editors: Hubka), Edition, Schriftenreihe WDK 21. Zürich, 1991, clmc (inbook)

[BibTex]

1991

[BibTex]