NAVER LABS Europe seminars are open to the public. This seminar is virtual and requires registration.
Date: 23rd February 2021, 10:00 am (GMT +01.00)
Learning to walk: optimizing trajectories and policies for real robots and dynamic tasks
Speaker: Nicolas Mansard is senior researcher in the Gepetto team at LAAS-CNRS, Toulouse, France. He is one of the lead members of the team which has 8 permanent researchers and about 25 post-docs and PhD students. Gepetto is recognized for its expertise in legged robotics.The team developed the humanoid research platform of 3 full-size humanoid robots and a family of smaller quadrupeds. Nicolas Mansard coordinates the EU project “Memory of Motion” and holds the chair “Artificial and Natural Movement” of the Toulouse AI Lab ANITI.
He received the CNRS Bronze Medal and the award of the best computer-science project of the French research agency ANR. Before that, he was visiting researcher at the University of Washington, a post-doctoral researcher at AIST Japan and Stanford University. He received his PhD from University of Rennes in 2006.
Abstract: Robotics pose challenging problems in the field of artificial intelligence. Among the various problems, I favor legged locomotion the most, for the variety of open questions, but also for how representative the problems are for many other robots. Legged locomotion implies autonomy, dynamics, mobility, accuracy and speed, dimensionality and controllability. The current solution to this challenge, beautifully demonstrated by Boston Dynamics this winter, implies tailored reduced models and trajectory optimization of the so-called centroidal dynamics. While we proudly contributed to this approach, we now believe that the next level is to produce similar movements, or better, without relying on ad-hoc simplification. We will explain how careful modeling, numerical optimization and predictive control enable us to optimize the whole-body dynamics, accounting in real time for the 30 robot motors and various balance or collision constraints.
While the optimal control problem to solve in real time is non-convex, we rely on a memory of motion, trained off-line on a large database of planned movements, to guide the numerical algorithm to the optimum. The resulting control corresponds to the optimal policy, yet it is optimized in real-time from a gross approximation by rolling out the robot predicted movements. Our next step is to understand how the reinforcement learning (RL) algorithm can be made more accurate and faster to converge to obtain a fair closed-form approximation of the optimal policy, which matches the accuracy level necessary for our legged machines. With that objective in mind, we are investigating second-order RL algorithms, exploiting the derivatives of the model and locally optimal roll-outs.