A unified framework for robot arm path planning—which combines offline modelling with inverse-solution mapping based on data-driven statistical techniques—increases computational efficiency and dramatically reduces robot-operation complexity.
Seungsu Kim, Julien Perez |
2021 |
A unified framework for robot arm path planning—which combines offline modelling with inverse-solution mapping based on data-driven statistical techniques—increases computational efficiency and dramatically reduces robot-operation complexity.
Most robots at work in our world today operate in well-structured and predictable physical environments, such as car assembly lines. The robotic manipulators used in these industrial settings (often in the form of robotic arms with ‘end-effectors’, which interact with the environment like hands) are highly efficient and precise machines.
But outside of these predictable environments, robots are clumsy at performing useful tasks and struggle to perform the everyday activities that may seem simple from a human perspective, such as delivering the right package to the right person or clearing the dining table. In contrast, humans can easily manipulate unfamiliar objects in new situations and environments. This is a natural result of the way our brains develop from a young age, giving rise to the adaptable skills and complex planning abilities that make it easy for us to interact with the world.
For robots to be useful in our personal and working lives, they need not only to detect and predict the effect of changes in their environment but also to adapt their plans accordingly. They must deal with different kinds of uncertainty—such as multi-contact forces, deformation and wear-and-tear of robotic hardware—all of which are challenging to achieve with traditional model-based techniques due to their high computational expense and the limitations of sensor technology.
Statistical approaches, which aim to find predictive functions for robot movement, are becoming dominant in this field. These approaches make it possible for models to learn from large amounts of data acquired from previous experiences or from expert human demonstrators. Compared with traditional model-based techniques (‘Robot modeling and control’, Spong et al., 2020), statistical learning techniques rely less on perception accuracy—in other words, there is more room for sensor error—and are therefore more robust when performing diverse tasks that require complex interactions with the physical world (1).
This article is the first in a series that describe different areas of research in the context of making robots part of our everyday lives. Here, we introduce recent work on statistical robot manipulation and the kinematic feasibility validation model of planned robot arm motion, as well as the corresponding inverse-solution learning approach based on data-driven statistical techniques.
One of the challenges for general robot motion planning is validating the feasibility of movements within the robot’s constraints. In particular, solving optimization problems and finding a way out of local minima regions (where a robot becomes trapped or blocked) are time-consuming operations. One approach to tackling this issue is to model the kinematic feasibility manifold of a robot end-effector offline. Then the model can be directly queried about specific poses during operation, bypassing the need for on-the-fly calculations and thereby speeding up operation.
We therefore propose using a continuous ‘reachable manifold’ learning approach to validate how feasible movements are for a robot manipulator (2, 3). This increases efficiency in a few ways. First, a robot can plan its motion in the task space without making time-consuming iterative feasibility checks along a planned trajectory. Second, infeasible grasping poses for objects that are out of reach of the robot can be discarded. And finally, our approach provides a good indication of the richness—or number of ways (solutions)—there are for positioning a robot end-effector into a target pose (4).
So, this reachable manifold provides a good indication of kinematic feasibility. It also provides an idea of the richness of inverse mapping solutions (i.e. the number of ways that the robot—actuated in joint space—can move in Cartesian space) for a given robot task. This kind of inverse mapping problem is a classic one in robotics. However, most adopted solutions are gradient-based iterative (i.e. Jacobian) optimization approaches. This means that, for a redundant manipulator (in other words, one that has more degrees of freedom than are necessary for the task and therefore infinitely many solutions for achieving a specific robot end-effector pose), the results are strongly affected by the initial joint configurations and additional constraints (5). Although there are various learning approaches for this problem (6, 7, 8), achieving a one-shot joint configuration estimation remains extremely challenging.
Here, we consider a range of inverse solutions for a redundant manipulator to place its end-effector into a specific pose using the unified framework that we have developed.
In our approach, we train the models from a dataset that is generated using a robot motor babbling procedure within a simulation. Motor babbling is a process used in autonomous robot learning that imitates the way human infants learn. By performing random motor commands in space, a robot develops a model of its own spatial presence and its environment and thereby learns about its kinematic capability.
Because we’re considering a redundant inverse mapping problem, there are an infinite number of solutions in configuration space for placing a robot end-effector into a target pose. Hence, we propose employing a latent (or hidden) representation to encode the solutions and find a deterministic inverse mapping function.
To measure how frequently the robot end-effector visited a specific position and orientation when we ask a robot to move its joints freely, we use density. For example, an area far away from the robot might have zero or low density, whereas an area within its arm length might have high density. To obtain a density estimation approach and latent space variable (that can select one solution for placing the end-effector into a specific pose), we use block neural autoregressive flows (B-NAFs) (9, 10, 11). B-NAFs train the network directly and are outstanding in their ability to provide compact and universal density approximation models.
Once the density model is trained, a given pose is considered feasible when the density exceeds the reachable density thresholds. A high density indicates a richness of mappings to joint configuration space (or, in other words, a wide range of possible routes) for the end-effector pose.
To demonstrate the capability of the proposed framework on a commercial robot manipulator, we use Panda—from the Franka Emika company—for a typical application (object grasping).
We begin by acquiring training samples from motor babbling simulations of the robot. Then, we compute the resulting end-effector poses from the known forward kinematics. We also check for self-collision (12) and discard those samples from the dataset. Of the generated samples, we use 80% for training and the others for testing.
For the validation dataset, we also generate kinematically infeasible samples. This requires a more complicated process because the samples must be validated analytically or probabilistically. We test each pose in reachable space by using a general Jacobian-based inverse kinematics (IK) solver (13). As the initial choice of joint angles affects the solution, 200 different ones are chosen randomly for each target pose. If all IK attempts fail within 200 iterations, the pose is considered as negative (infeasible), or otherwise as positive (feasible). Of the total generated samples, we find that are feasible and the rest (93.2%) are infeasible. This ratio is small because we consider the full range of orientation for the end-effector.
For the forward and inverse models (2, 3), we use a fully connected neural network with six and nine layers, respectively. The dimensions of all hidden layers are set to 500. The density model (developed by a network that estimates joint density for a given pose and variable) consists of four layers with 66 hidden dimensions. The models are trained with an optimization algorithm for training deep-learning models (Adam) using learning rates of 0.0001 for both the forward and inverse models and 0.005 for the density model. The parameters have been cross-validated.
To calculate the loss associated with the joint configuration reconstruction and the end-effector pose and latent variable reconstruction, we simply use the mean squared error.
Figure 1 shows the loss curves computed from the testing and validation sets. The reachable density threshold is selected so that of training samples are marked as feasible. The threshold is then used at the next epoch to compute the true positive rate (TPR) and true negative rate (TNR), which after training were 0.99 and 0.95, respectively. Because the reachable density model is too high-dimensional to be displayed in 2D or 3D, we fix the z-position arbitrarily to obtain a few 2D examples—shown in Figure 2—of the reachable density of the robot. These examples show that different orientation constraints highly affect the reachable space in the x-y plane.
Figure 3 shows that, by using an inverse network with latent representation, a diverse range of solutions in joint configuration space can be achieved for a given target end-effector pose. This diversity is useful when a robot needs to satisfy constraints beyond achieving a specific pose. For example, if a human operator moves near to a robot’s elbow, the robot can select a pose that means its elbow is positioned away from the human as the required task is performed. The middle of Figure 3 shows the probability density in latent space for a given end-effector pose. Seven arbitrary latent values (for which the reachability score is higher than the threshold) are selected and their joint configurations are directly computed from the inverse network.
To check for self-colliding samples, we evaluate the resultant inverse mapping solutions from using the FCL library (12) and find that 99.7% are safe from self-collision.
Although the inverse mapping solutions still include some errors, they are small. For a task that requires high resolution, we add a refinement process—which includes one or two iterations of the Jacobian-based IK approach—on the resultant inverse mapping solution. Initial and rest joint configurations are set with the output of the inverse network. As we see in Figure 4, this refinement process helps to reduce the remaining end-effector pose error.
The online computation time for an inverse mapping of the method is for a single query and for a batch of 512 samples (computation is performed on an Nvidia Quadro P1000). Although not directly comparable with other approaches, such as TRAC-IK (14)—because ours uses a GPU, whereas most other approaches use CPUs—ours is highly beneficial for robot-learning applications that are implemented to use GPUs. Additionally, operation is made vastly simpler because, once training is complete, no additional external libraries are required.
Generating a successful grasping pose is an essential, but crucial component for robot object manipulation tasks. Although numerous approaches have been proposed in this domain (14, 15, 16, 17), most are focused on finding diverse robot hand poses with respect to the object coordinate system. Selecting an optimal grasping configuration among them depends highly on the target task and kinematic feasibility of the robot. Indeed, validating the kinematic feasibility is essential for successful grasping. A conventional method, such as a diverse trial of IK for different initial configurations, may be an option; but it is a time-consuming process and has a risk of falling on a local minima. In contrast, our approach directly provides the kinematic feasibility and corresponding joint configurations together, without the need for iterative computation.
Among the different approaches for generating diverse grasping candidates with respect to the object coordinate system, we use 6-DOF GraspNet (15), as it provides a diverse range of possible grasps for an object given a specific gripper, and achieves state-of-the-art performance (in terms of the success rate for picking up objects). In this experiment (see Figure 5), we use one of the objects and the pretrained model and implementation from (15, 18).
We’ve introduced a unified learning framework for validating the feasibility of a desired end-effector pose and computing its diverse inverse mapping solutions, which are essential elements for robot arm path planning. As our approach implements offline modelling, it achieves these aims without iterative process in the execution stage, making it more efficient in operation. Furthermore, our approach reduces the risk of the robot becoming stuck and of self-collision.
Although we use latent representation, which is conceptually similar to null-space motion, our approach has minimum representation in the latent space and provides a joint position space solution (rather than the local velocity space). The inverse mapping solutions do include some very small errors, but for a task that requires high accuracy, an additional refinement process can be applied to remove the remaining end-effector pose error.
In the future, we aim to apply our framework to high-level policy-learning and reinforcement-learning tasks to reduce the search space for an action policy by efficiently discarding the infeasible goal space and action space in advance. We also plan to extend the latent representation to encode useful information for robot manipulation, such as manipulability (19).
1: Probabilistic robotics. Sebastian Thrun, Wolfram Burgard and Dieter Fox. MIT Press, 2005.
2: Catching objects in flight. Seungsu Kim, Ashwini Shukla and Aude Billard. IEEE Transactions on Robotics, vol. 30, no. 5, 2014, pp. 1049–1065.
3: Gaussian mixture model for 3-DoF orientations. Seungsu Kim, Robert Haschke, Helge Ritter. Robotics and Autonomous Systems, vol. 87, 2017, pp. 28–37.
4: Learning reachable manifold and inverse mapping for a redundant robot manipulator. Seungsu Kim and Julien Perez, IEEE International Conference on Robotics and Automation (ICRA), virtual conference, 30 May – 5 June, 2021.
5: Automatic supervisory control of the configuration and behavior of multibody mechanisms. Alain Liégeois. IEEE Transactions on Systems, Man, and Cybernetics, vol. 7, no. 12, 1977, pp. 868–871.
6: Learning inverse kinematics. Aaron D’Souza, Sethu Vijayakumar and Stefan Schaal. IEEE International Conference on Intelligent Robots and Systems (IROS), Maui, Hawaii, USA, 29 October–6 August 2001, pp. 298–303.
7: Goal babbling permits direct learning of inverse kinematics. Matthias Rolf, Jochen J. Steil and Michael Gienger. IEEE Transactions on Autonomous Mental Development, vol. 2, no. 3, 2010, pp. 216–229.
8: Analyzing inverse problems with invertible neural networks. Lynton Ardizzone, Jakob Kruse, Sebastian Wirkert, Daniel Rahner, Eric W. Pellegrini, Ralf S. Klessen, Lena Maier-Hein et al. International Conference on Learning Representations (ICLR), New Orleans, Louisiana, USA, 6–9 May 2019.
9: Block neural autoregressive flow. Nicola De Cao, Ivan Titov and Wilker Aziz. 35th Conference on Uncertainty in Artificial Intelligence (UAI), Tel Aviv, Israel, 22–25 July 2019.
10: Neural autoregressive flows. Chin-Wei Huang, David Krueger, Alexandre Lacoste and Aaron Courville. 35th International Conference on Machine Learning (ICML), Stockholm, Sweden, 10–15 July 2018.
11: MADE: masked autoencoder for distribution estimation. Mathieu Germain, Karol Gregor, Iain Murray and Hugo Larochelle. 32nd International Conference on Machine Learning (ICML), Lille, France, 6–11 July 2015, pp. 881–889.
12: FCL: a general purpose library for collision and proximity queries. Jia Pan, Sachin Chitta and Dinesh Manocha. IEEE International Conference on Robotics and Automation (ICRA), Saint Paul, Minnesota, USA, 14–19 May 2012, pp. 3859–3866.
13: A complete generalized solution to the inverse kinematics of robots. Andrew Goldenberg, Beno Benhabib and Robert Fenton. IEEE Journal on Robotics and Automation, vol. 1, no. 1, 1985, pp. 14–20.
14: TRAC-IK: an open-source library for improved solving of generic inverse kinematics. Patrick Beeson and Barrett Ames. IEEE-RAS 15th International Conference on Humanoid Robots, Seoul, Republic of Korea, 3–November 2015, pp. 928–935.
15: 6-DOF GraspNet: variational grasp generation for object manipulation. Arsalan Mousavian, Clemens Eppner and Dieter Fox. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019, pp. 2901–2910.
16: GraspNet-1Billion: a large-scale benchmark for general object grasping. Hao-Shu Fang, Chenxi Wang, Minghao Gou and Cewu Lu. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), virtual conference, 13–19 June 2020, pp. 11441–11450.
17: Supersizing self-supervision: learning to grasp from 50K tries and 700 robot hours. Lerrel Pinto and Abhinav Gupta. IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden, 16–21 May 2016.
18: 6-DOF GraspNet Pytorch. Jens Lundell. Github, 2020.
19: Geometry-aware manipulability learning, tracking, and transfer. Noémie Jaquier, Leonel Rozo, Darwin G. Caldwell and Sylvain Calinon. International Journal of Robotics Research, vol. 40, no. 2–3, 2020, pp. 624–650.
NAVER LABS Europe 6-8 chemin de Maupertuis 38240 Meylan France Contact
To make robots autonomous in real-world everyday spaces, they should be able to learn from their interactions within these spaces, how to best execute tasks specified by non-expert users in a safe and reliable way. To do so requires sequential decision-making skills that combine machine learning, adaptive planning and control in uncertain environments as well as solving hard combinatorial optimization problems. Our research combines expertise in reinforcement learning, computer vision, robotic control, sim2real transfer, large multimodal foundation models and neural combinatorial optimization to build AI-based architectures and algorithms to improve robot autonomy and robustness when completing everyday complex tasks in constantly changing environments. More details on our research can be found in the Explore section below.
For a robot to be useful it must be able to represent its knowledge of the world, share what it learns and interact with other agents, in particular humans. Our research combines expertise in human-robot interaction, natural language processing, speech, information retrieval, data management and low code/no code programming to build AI components that will help next-generation robots perform complex real-world tasks. These components will help robots interact safely with humans and their physical environment, other robots and systems, represent and update their world knowledge and share it with the rest of the fleet. More details on our research can be found in the Explore section below.
Visual perception is a necessary part of any intelligent system that is meant to interact with the world. Robots need to perceive the structure, the objects, and people in their environment to better understand the world and perform the tasks they are assigned. Our research combines expertise in visual representation learning, self-supervised learning and human behaviour understanding to build AI components that help robots understand and navigate in their 3D environment, detect and interact with surrounding objects and people and continuously adapt themselves when deployed in new environments. More details on our research can be found in the Explore section below.
Details on the gender equality index score 2024 (related to year 2023) for NAVER France of 87/100.
The NAVER France targets set in 2022 (Indicator n°1: +2 points in 2024 and Indicator n°4: +5 points in 2025) have been achieved.
—————
Index NAVER France de l’égalité professionnelle entre les femmes et les hommes pour l’année 2024 au titre des données 2023 : 87/100
Détail des indicateurs :
Les objectifs de progression de l’Index définis en 2022 (Indicateur n°1 : +2 points en 2024 et Indicateur n°4 : +5 points en 2025) ont été atteints.
Details on the gender equality index score 2024 (related to year 2023) for NAVER France of 87/100.
1. Difference in female/male salary: 34/40 points
2. Difference in salary increases female/male: 35/35 points
3. Salary increases upon return from maternity leave: Non calculable
4. Number of employees in under-represented gender in 10 highest salaries: 5/10 points
The NAVER France targets set in 2022 (Indicator n°1: +2 points in 2024 and Indicator n°4: +5 points in 2025) have been achieved.
——————-
Index NAVER France de l’égalité professionnelle entre les femmes et les hommes pour l’année 2024 au titre des données 2023 : 87/100
Détail des indicateurs :
1. Les écarts de salaire entre les femmes et les hommes: 34 sur 40 points
2. Les écarts des augmentations individuelles entre les femmes et les hommes : 35 sur 35 points
3. Toutes les salariées augmentées revenant de congé maternité : Incalculable
4. Le nombre de salarié du sexe sous-représenté parmi les 10 plus hautes rémunérations : 5 sur 10 points
Les objectifs de progression de l’Index définis en 2022 (Indicateur n°1 : +2 points en 2024 et Indicateur n°4 : +5 points en 2025) ont été atteints.
To make robots autonomous in real-world everyday spaces, they should be able to learn from their interactions within these spaces, how to best execute tasks specified by non-expert users in a safe and reliable way. To do so requires sequential decision-making skills that combine machine learning, adaptive planning and control in uncertain environments as well as solving hard combinatorial optimisation problems. Our research combines expertise in reinforcement learning, computer vision, robotic control, sim2real transfer, large multimodal foundation models and neural combinatorial optimisation to build AI-based architectures and algorithms to improve robot autonomy and robustness when completing everyday complex tasks in constantly changing environments.
The research we conduct on expressive visual representations is applicable to visual search, object detection, image classification and the automatic extraction of 3D human poses and shapes that can be used for human behavior understanding and prediction, human-robot interaction or even avatar animation. We also extract 3D information from images that can be used for intelligent robot navigation, augmented reality and the 3D reconstruction of objects, buildings or even entire cities.
Our work covers the spectrum from unsupervised to supervised approaches, and from very deep architectures to very compact ones. We’re excited about the promise of big data to bring big performance gains to our algorithms but also passionate about the challenge of working in data-scarce and low-power scenarios.
Furthermore, we believe that a modern computer vision system needs to be able to continuously adapt itself to its environment and to improve itself via lifelong learning. Our driving goal is to use our research to deliver embodied intelligence to our users in robotics, autonomous driving, via phone cameras and any other visual means to reach people wherever they may be.
This web site uses cookies for the site search, to display videos and for aggregate site analytics.
Learn more about these cookies in our privacy notice.
You may choose which kind of cookies you allow when visiting this website. Click on "Save cookie settings" to apply your choice.
FunctionalThis website uses functional cookies which are required for the search function to work and to apply for jobs and internships.
AnalyticalOur website uses analytical cookies to make it possible to analyse our website and optimize its usability.
Social mediaOur website places social media cookies to show YouTube and Vimeo videos. Cookies placed by these sites may track your personal data.
This content is currently blocked. To view the content please either 'Accept social media cookies' or 'Accept all cookies'.
For more information on cookies see our privacy notice.