Topics

Learning robot manipulation – modelling the reachable space of a robot and its inverse mapping

Published by Seungsu Kim at 30 August 2021

2021

A unified framework for robot arm path planning—which combines offline modelling with inverse-solution mapping based on data-driven statistical techniques—increases computational efficiency and dramatically reduces robot-operation complexity.

Most robots at work in our world today operate in well-structured and predictable physical environments, such as car assembly lines. The robotic manipulators used in these industrial settings (often in the form of robotic arms with ‘end-effectors’, which interact with the environment like hands) are highly efficient and precise machines.

But outside of these predictable environments, robots are clumsy at performing useful tasks and struggle to perform the everyday activities that may seem simple from a human perspective, such as delivering the right package to the right person or clearing the dining table. In contrast, humans can easily manipulate unfamiliar objects in new situations and environments. This is a natural result of the way our brains develop from a young age, giving rise to the adaptable skills and complex planning abilities that make it easy for us to interact with the world.

A statistical approach to robot manipulation

For robots to be useful in our personal and working lives, they need not only to detect and predict the effect of changes in their environment but also to adapt their plans accordingly. They must deal with different kinds of uncertainty—such as multi-contact forces, deformation and wear-and-tear of robotic hardware—all of which are challenging to achieve with traditional model-based techniques due to their high computational expense and the limitations of sensor technology.

Statistical approaches, which aim to find predictive functions for robot movement, are becoming dominant in this field. These approaches make it possible for models to learn from large amounts of data acquired from previous experiences or from expert human demonstrators. Compared with traditional model-based techniques (‘Robot modeling and control’, Spong et al., 2020), statistical learning techniques rely less on perception accuracy—in other words, there is more room for sensor error—and are therefore more robust when performing diverse tasks that require complex interactions with the physical world (1).

This article is the first in a series that describe different areas of research in the context of making robots part of our everyday lives. Here, we introduce recent work on statistical robot manipulation and the kinematic feasibility validation model of planned robot arm motion, as well as the corresponding inverse-solution learning approach based on data-driven statistical techniques.

Using a reachable manifold to determine the feasibility of robot movement

One of the challenges for general robot motion planning is validating the feasibility of movements within the robot’s constraints. In particular, solving optimization problems and finding a way out of local minima regions (where a robot becomes trapped or blocked) are time-consuming operations. One approach to tackling this issue is to model the kinematic feasibility manifold of a robot end-effector offline. Then the model can be directly queried about specific poses during operation, bypassing the need for on-the-fly calculations and thereby speeding up operation.

We therefore propose using a continuous ‘reachable manifold’ learning approach to validate how feasible movements are for a robot manipulator (2, 3). This increases efficiency in a few ways. First, a robot can plan its motion in the task space without making time-consuming iterative feasibility checks along a planned trajectory. Second, infeasible grasping poses for objects that are out of reach of the robot can be discarded. And finally, our approach provides a good indication of the richness—or number of ways (solutions)—there are for positioning a robot end-effector into a target pose (4).

Planning paths with inverse mapping solutions

So, this reachable manifold provides a good indication of kinematic feasibility. It also provides an idea of the richness of inverse mapping solutions (i.e. the number of ways that the robot—actuated in joint space—can move in Cartesian space) for a given robot task. This kind of inverse mapping problem is a classic one in robotics. However, most adopted solutions are gradient-based iterative (i.e. Jacobian) optimization approaches. This means that, for a redundant manipulator (in other words, one that has more degrees of freedom than are necessary for the task and therefore infinitely many solutions for achieving a specific robot end-effector pose), the results are strongly affected by the initial joint configurations and additional constraints (5). Although there are various learning approaches for this problem (6, 7, 8), achieving a one-shot joint configuration estimation remains extremely challenging.

Here, we consider a range of inverse solutions for a redundant manipulator to place its end-effector into a specific pose using the unified framework that we have developed.

Training the models using motor babbling and probability density

In our approach, we train the models from a dataset that is generated using a robot motor babbling procedure within a simulation. Motor babbling is a process used in autonomous robot learning that imitates the way human infants learn. By performing random motor commands in space, a robot develops a model of its own spatial presence and its environment and thereby learns about its kinematic capability.

Because we’re considering a redundant inverse mapping problem, there are an infinite number of solutions in configuration space for placing a robot end-effector into a target pose. Hence, we propose employing a latent (or hidden) representation to encode the solutions and find a deterministic inverse mapping function.

To measure how frequently the robot end-effector visited a specific position and orientation when we ask a robot to move its joints freely, we use density. For example, an area far away from the robot might have zero or low density, whereas an area within its arm length might have high density. To obtain a density estimation approach and latent space variable (that can select one solution for placing the end-effector into a specific pose), we use block neural autoregressive flows (B-NAFs) (9, 10, 11). B-NAFs train the network directly and are outstanding in their ability to provide compact and universal density approximation models.

Once the density model is trained, a given pose is considered feasible when the density exceeds the reachable density thresholds. A high density indicates a richness of mappings to joint configuration space (or, in other words, a wide range of possible routes) for the end-effector pose.

Validating our approach experimentally with the Panda robot

To demonstrate the capability of the proposed framework on a commercial robot manipulator, we use Panda—from the Franka Emika company—for a typical application (object grasping).

We begin by acquiring training samples from motor babbling simulations of the robot. Then, we compute the resulting end-effector poses from the known forward kinematics. We also check for self-collision (12) and discard those samples from the dataset. Of the generated samples, we use 80% for training and the others for testing.

For the validation dataset, we also generate kinematically infeasible samples. This requires a more complicated process because the samples must be validated analytically or probabilistically. We test each pose in reachable space by using a general Jacobian-based inverse kinematics (IK) solver (13). As the initial choice of joint angles affects the solution, 200 different ones are chosen randomly for each target pose. If all IK attempts fail within 200 iterations, the pose is considered as negative (infeasible), or otherwise as positive (feasible). Of the total generated samples, we find that are feasible and the rest (93.2%) are infeasible. This ratio is small because we consider the full range of orientation for the end-effector.

Our network architecture

For the forward and inverse models (2, 3), we use a fully connected neural network with six and nine layers, respectively. The dimensions of all hidden layers are set to 500. The density model (developed by a network that estimates joint density for a given pose and variable) consists of four layers with 66 hidden dimensions. The models are trained with an optimization algorithm for training deep-learning models (Adam) using learning rates of 0.0001 for both the forward and inverse models and 0.005 for the density model. The parameters have been cross-validated.

To calculate the loss associated with the joint configuration reconstruction and the end-effector pose and latent variable reconstruction, we simply use the mean squared error.

Our results: analysing performance and solution diversity

Figure 1 shows the loss curves computed from the testing and validation sets. The reachable density threshold is selected so that of training samples are marked as feasible. The threshold is then used at the next epoch to compute the true positive rate (TPR) and true negative rate (TNR), which after training were 0.99 and 0.95, respectively. Because the reachable density model is too high-dimensional to be displayed in 2D or 3D, we fix the z-position arbitrarily to obtain a few 2D examples—shown in Figure 2—of the reachable density of the robot. These examples show that different orientation constraints highly affect the reachable space in the x-y plane.

Figure 3 shows that, by using an inverse network with latent representation, a diverse range of solutions in joint configuration space can be achieved for a given target end-effector pose. This diversity is useful when a robot needs to satisfy constraints beyond achieving a specific pose. For example, if a human operator moves near to a robot’s elbow, the robot can select a pose that means its elbow is positioned away from the human as the required task is performed. The middle of Figure 3 shows the probability density in latent space for a given end-effector pose. Seven arbitrary latent values (for which the reachability score is higher than the threshold) are selected and their joint configurations are directly computed from the inverse network.

To check for self-colliding samples, we evaluate the resultant inverse mapping solutions from using the FCL library (12) and find that 99.7% are safe from self-collision.

Although the inverse mapping solutions still include some errors, they are small. For a task that requires high resolution, we add a refinement process—which includes one or two iterations of the Jacobian-based IK approach—on the resultant inverse mapping solution. Initial and rest joint configurations are set with the output of the inverse network. As we see in Figure 4, this refinement process helps to reduce the remaining end-effector pose error.

The online computation time for an inverse mapping of the method is for a single query and for a batch of 512 samples (computation is performed on an Nvidia Quadro P1000). Although not directly comparable with other approaches, such as TRAC-IK (14)—because ours uses a GPU, whereas most other approaches use CPUs—ours is highly beneficial for robot-learning applications that are implemented to use GPUs. Additionally, operation is made vastly simpler because, once training is complete, no additional external libraries are required.

Estimating grasping poses for manipulation tasks

Generating a successful grasping pose is an essential, but crucial component for robot object manipulation tasks. Although numerous approaches have been proposed in this domain (14, 15, 16, 17), most are focused on finding diverse robot hand poses with respect to the object coordinate system. Selecting an optimal grasping configuration among them depends highly on the target task and kinematic feasibility of the robot. Indeed, validating the kinematic feasibility is essential for successful grasping. A conventional method, such as a diverse trial of IK for different initial configurations, may be an option; but it is a time-consuming process and has a risk of falling on a local minima. In contrast, our approach directly provides the kinematic feasibility and corresponding joint configurations together, without the need for iterative computation.

Among the different approaches for generating diverse grasping candidates with respect to the object coordinate system, we use 6-DOF GraspNet (15), as it provides a diverse range of possible grasps for an object given a specific gripper, and achieves state-of-the-art performance (in terms of the success rate for picking up objects). In this experiment (see Figure 5), we use one of the objects and the pretrained model and implementation from (15, 18).

Conclusion and plans for future work

We’ve introduced a unified learning framework for validating the feasibility of a desired end-effector pose and computing its diverse inverse mapping solutions, which are essential elements for robot arm path planning. As our approach implements offline modelling, it achieves these aims without iterative process in the execution stage, making it more efficient in operation. Furthermore, our approach reduces the risk of the robot becoming stuck and of self-collision.

Although we use latent representation, which is conceptually similar to null-space motion, our approach has minimum representation in the latent space and provides a joint position space solution (rather than the local velocity space). The inverse mapping solutions do include some very small errors, but for a task that requires high accuracy, an additional refinement process can be applied to remove the remaining end-effector pose error.

In the future, we aim to apply our framework to high-level policy-learning and reinforcement-learning tasks to reduce the search space for an action policy by efficiently discarding the infeasible goal space and action space in advance. We also plan to extend the latent representation to encode useful information for robot manipulation, such as manipulability (19).

References:

1: Probabilistic robotics. Sebastian Thrun, Wolfram Burgard and Dieter Fox. MIT Press, 2005.

2: Catching objects in flight. Seungsu Kim, Ashwini Shukla and Aude Billard. IEEE Transactions on Robotics, vol. 30, no. 5, 2014, pp. 1049–1065.

3: Gaussian mixture model for 3-DoF orientations. Seungsu Kim, Robert Haschke, Helge Ritter. Robotics and Autonomous Systems, vol. 87, 2017, pp. 28–37.

4: Learning reachable manifold and inverse mapping for a redundant robot manipulator. Seungsu Kim and Julien Perez, IEEE International Conference on Robotics and Automation (ICRA), virtual conference, 30 May – 5 June, 2021.

5: Automatic supervisory control of the configuration and behavior of multibody mechanisms. Alain Liégeois. IEEE Transactions on Systems, Man, and Cybernetics, vol. 7, no. 12, 1977, pp. 868–871.

6: Learning inverse kinematics. Aaron D’Souza, Sethu Vijayakumar and Stefan Schaal. IEEE International Conference on Intelligent Robots and Systems (IROS), Maui, Hawaii, USA, 29 October–6 August 2001, pp. 298–303.

7: Goal babbling permits direct learning of inverse kinematics. Matthias Rolf, Jochen J. Steil and Michael Gienger. IEEE Transactions on Autonomous Mental Development, vol. 2, no. 3, 2010, pp. 216–229.

8: Analyzing inverse problems with invertible neural networks. Lynton Ardizzone, Jakob Kruse, Sebastian Wirkert, Daniel Rahner, Eric W. Pellegrini, Ralf S. Klessen, Lena Maier-Hein et al. International Conference on Learning Representations (ICLR), New Orleans, Louisiana, USA, 6–9 May 2019.

9: Block neural autoregressive flow. Nicola De Cao, Ivan Titov and Wilker Aziz. 35th Conference on Uncertainty in Artificial Intelligence (UAI), Tel Aviv, Israel, 22–25 July 2019.

10: Neural autoregressive flows. Chin-Wei Huang, David Krueger, Alexandre Lacoste and Aaron Courville. 35th International Conference on Machine Learning (ICML), Stockholm, Sweden, 10–15 July 2018.

11: MADE: masked autoencoder for distribution estimation. Mathieu Germain, Karol Gregor, Iain Murray and Hugo Larochelle. 32nd International Conference on Machine Learning (ICML), Lille, France, 6–11 July 2015, pp. 881–889.

12: FCL: a general purpose library for collision and proximity queries. Jia Pan, Sachin Chitta and Dinesh Manocha. IEEE International Conference on Robotics and Automation (ICRA), Saint Paul, Minnesota, USA, 14–19 May 2012, pp. 3859–3866.

13: A complete generalized solution to the inverse kinematics of robots. Andrew Goldenberg, Beno Benhabib and Robert Fenton. IEEE Journal on Robotics and Automation, vol. 1, no. 1, 1985, pp. 14–20.

14: TRAC-IK: an open-source library for improved solving of generic inverse kinematics. Patrick Beeson and Barrett Ames. IEEE-RAS 15th International Conference on Humanoid Robots, Seoul, Republic of Korea, 3–November 2015, pp. 928–935.

15: 6-DOF GraspNet: variational grasp generation for object manipulation. Arsalan Mousavian, Clemens Eppner and Dieter Fox. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019, pp. 2901–2910.

16: GraspNet-1Billion: a large-scale benchmark for general object grasping. Hao-Shu Fang, Chenxi Wang, Minghao Gou and Cewu Lu. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), virtual conference, 13–19 June 2020, pp. 11441–11450.

17: Supersizing self-supervision: learning to grasp from 50K tries and 700 robot hours. Lerrel Pinto and Abhinav Gupta. IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden, 16–21 May 2016.

18: 6-DOF GraspNet Pytorch. Jens Lundell. Github, 2020.

19: Geometry-aware manipulability learning, tracking, and transfer. Noémie Jaquier, Leonel Rozo, Darwin G. Caldwell and Sylvain Calinon. International Journal of Robotics Research, vol. 40, no. 2–3, 2020, pp. 624–650.

Learning robot manipulation – modelling the reachable space of a robot and its inverse mapping

A statistical approach to robot manipulation

Using a reachable manifold to determine the feasibility of robot movement

Planning paths with inverse mapping solutions

Training the models using motor babbling and probability density