RESEARCH
Making robots part of everyday life
AI for robotics research at NAVER LABS Europe is driven by the ambition to build foundation models (FMs) capable of powering versatile, real-world robotic systems. These models are conceived to generalize across diverse tasks and environments, enabling robots to seamlessly interact, navigate and manipulate. The approach is structured around three complementary axes: (1) developing new architectures that can effectively learn and transfer skills, (2) creating training regimes that exploit synergies between tasks and (3) devising evaluation protocols that measure performance in realistic, dynamic settings.
Progress along these axes draws on multidisciplinary expertise, combining deep learning research, robotic control, computer vision and natural language understanding. This integrated skill set allows us to tackle both perception and action, ensuring that FMs can interpret complex environments while executing precise, context-aware behaviours. By leveraging such competencies, we aim to move beyond task-specific systems towards models that exhibit adaptability, robustness and the ability to reason across domains.
Through this structured exploration, NAVER LABS Europe positions itself at the forefront of FM-driven robotics, bridging theoretical AI advances with the practical demands of embodied agents. The result is a research direction that not only pushes technical boundaries but also lays the groundwork for robots that can operate autonomously and effectively in the open world, delivering tangible benefits in guidance, assistance and service applications.
Vision
Perception to help robots understand and interact with the environment.
Visual perception is a necessary part of any intelligent system that is meant to interact with the world. Robots need to perceive the structure, the objects, and people in their environment to better understand the world and perform the tasks they are assigned. We combine expertise in visual representation learning, self-supervised learning and human behaviour understanding to build AI components that help robots understand and navigate in their 3D environment, detect and interact with surrounding objects and people and continuously adapt themselves when deployed in new environments.
3D Foundation Models
Unified 3D vision models such as our DUSt3R/MASt3R frameworks that integrate tasks like depth, pose and reconstruction into a single transformer-based framework to dramatically simplify scene understanding.
Human Centric Computer Vision
Visual models that reliably perceive and predict human pose, shape and activity from images or video, enabling safer, more natural human–robot interaction and 3D human generation.
Lifelong Learning for Visual Representation
Building visual perception systems to adapt continuously to new environments and tasks without forgetting past knowledge whilst unifying encoders for effective embodied AI.
Visual Localization
Advancing robust camera pose estimation by matching images to 3D maps and supporting the community with toolkits and datasets for location-based systems such as self-driving cars, autonomous robots or AR/VR.
Action
Providing embodied agents with sequential decision-making capabilities to safely execute complex tasks in dynamic environments.
To make robots autonomous in real-world everyday spaces, they should be able to learn from their interactions within these spaces, how to best execute tasks specified by non-expert users in a safe and reliable way. To do so requires sequential decision-making skills that combine machine learning, adaptive planning and control in uncertain environments as well as solving hard combinatorial optimization problems. Our research combines expertise in reinforcement learning, computer vision, robotic control, sim2real transfer, large multimodal foundation models and neural combinatorial optimization to build AI-based architectures and algorithms to improve robot autonomy and robustness when completing everyday complex tasks in constantly changing environments.
Neural Combinatorial Optimization for Robot Fleet Management
Creating solutions to challenging combinatorial optimization problems for the coordination and management of robot fleets delivering services in real environments and the services they deliver.
Foundation Models for Robot Navigation
End-to-end foundation models that enable robots to navigate diverse real-world environments without prior maps or special setup by modelling realistic agent behaviour and dynamics during learning.
Interaction
Equip robots to interact safely with humans, other robots and systems.
For a robot to be useful it must be able to represent its knowledge of the world, share what it learns and interact with other agents, in particular humans. Our research in HRI, NLP, speech, IR, data management and low code/no code programming is targeted at building AI components to help robots perform complex real-world tasks. These components help the robots safely interact with humans and their physical environment, with other robots and systems and represent, update and share their world knowledge.
Multimodal NLP for Robotics
Developing multimodal natural language processing techniques that enable robots to understand and generate human speech and text robustly to make human-robot communication more natural, efficient and user-friendly.
Socially Aware Robot Navigation
Designing general navigation policies that let robots move naturally and safely in human-shared spaces by understanding social context and human behaviour.
LLMs for Robotics
Exploring how LLMs and multimodal extensions can enhance robotic services by bridging NLU, reasoning and task planning with real robot behaviour in complex real-world environments.

