NAVER LABS Europe virtual seminars are open to the public. Please register here for your participation (Zoom event).
Date: 24th March 2025, 2:30 PM (CET)
ML-driven planning & control in robotics
About the speaker: Nicolas Thome is a Professor at Sorbonne University and researcher in the ISIR (Institute of Intelligent Systems and Robotics) and MLIA (Machine Learning & Information Access) units. His research interests cover machine learning for vision and multimodal data analysis. Currently, he leads the “Machine Learning for Robotics” (MLR) project team at ISIR where research is driven by a flagship project in surgical robotics. This project includes the open question in AI on adapting foundation models for multimodal planning in robotics, physics-informed learning methods for control and uncertainty quantification to enhance reliability and explainability. These advancements are applied to spinal surgery, aiming to improve the automation of co-manipulation and increase the acceptance of AI by patients and medical teams.
Abstract: The recent progress witnessed in AI, particularly in Machine Learning (ML) and Large Language Models (LLMs), offers promising opportunities for robotics. In this talk, recent work will be presented on incorporating ML for control and planning in robotics.
Regarding control, the PhIHP reinforcement learning (RL) framework, which combines the strengths of model-based (MB) and model-free (MF) methods in RL is introduced, along with a physics-informed model of the environment. PhIHP demonstrates remarkable sample efficiency and provides an excellent trade-off between inference time and performance compared to state-of-the-art MB and MF approaches. An extension of this approach is proposed to handle hard constraints during inference by addressing inference delays and directly learning controllers on robotic platforms.
Regading planning, Prof Thome has been studying methods for learning embodied agents in instruction-based planning tasks. He first analyzed recent methods that fine-tune LLMs on downstream tasks and showed that the agent’s performance strongly depends on the format of the prompt used to describe the environment. He proposed a solution to mitigate this issue, enabling the learning of more robust agents then explored visual instruction-based planning, where the agent perceives its environment directly through images. Thome introduced the VIPER framework, which combines Vision Language Models (VLMs) to perceive the environment and generate textual descriptions, and an LLM to perform reasoning. VIPER is trained using a combination of Behavioral Cloning (BC) and RL, and its intermediate text representation paves the way for rich explainability solutions to better understand the agent’s decision-making process.