DAN 23: 24th August 2023. Grand Intercontinental Seoul Parnas
DAN 23: 24th August 2023. Grand Intercontinental Seoul Parnas
TEAM NAVER CONFERENCE DAN is an event to showcase TEAM NAVER’S technological vision and business plan to the world.
Generative AI is changing the world. To be more precise, we are changing the ‘digital world’.
What changes are we facing recently? First, there are chatbots. Chatbots provide natural, human-like answers 24 hours a day, 365 days a year. And there is also image-generating AI that pushes the boundaries of creativity. This tool opens up new experiences not only for experts, but also for people like me who don’t have much drawing skills. Avatars have also become more realistic. This is mainly an example of what happens in the digital world.
But there is another world worth paying attention to. This is the physical world we live in. The methodologies used in generative AI can also be utilized in the physical world. Plus, it has greater potential. Today we would like to introduce our research on this.
AI research for the physical world
Until now, Naver Labs’ main research target has been the intersection between AI and the physical world. The areas of focus are summarized as ‘action’, where the robot performs tasks in various environments, ‘vision’, where the robot understands and recognizes the environment, and ‘interaction’ between robots and humans. . One of its representative achievements is Naver’s new headquarters, ‘ 1784 ‘. Here, 100 robots provide a variety of robotic services, such as serving coffee to employees or delivering packages. This is where AI technology plays an important role. AI should make robots increasingly smarter and more versatile.
We are also expanding our scope beyond buildings to the entire city. This is a story about Digital Twin technology , which digitally replicates a vast physical space in 3D . AI’s ability to understand the complex physical world is also important for these technologies, which are core data for smart cities such as service robots, urban simulation, autonomous driving, and AR navigation. This will enable data-based urban planning and decision-making and create smart and sustainable cities.
In this way, AI is already playing an important role in understanding the physical world. And we started focusing a few years ago on new methodologies to accelerate this research. This is the Foundation Model.
Decided to switch to the foundation model
Researchers at Naver Labs Europe made a very important decision in 2021. The idea is to convert the research projects that have been carried out so far to be based on the foundation model. Foundation models are pre-trained using very large datasets, so to speak, and are a kind of semi-finished product, so they can be used for a variety of purposes. I am strongly convinced that this methodology will become mainstream. Still, the decision was not an easy one, with high transition costs involved.
Why did you make this choice at the time? To explain this, we must first talk about the limitations of traditional AI approaches.
Previous AI research primarily involved identifying a problem, collecting relevant data, and then training neural networks to find a solution. It is difficult to apply them one by one to various real-life situations. As details of the environment change, performance may degrade during the transition from training to deployment. From a service perspective, there are many difficulties in developing AI that accommodates the diverse needs of individual users.
To be more specific, let’s take AI for robots as an example. From a technological standpoint, it is extremely difficult to get robots to perform tasks autonomously in an uncontrolled environment. Being out of control means being unpredictable. Therefore, finding a way to respond even when a robot stops working unexpectedly is very complicated. So the more unexpected variables there are, the more choices the robot makes that limit its capabilities.
The foundation model is the key to breaking through these limitations. By training it comprehensively on massive amounts of data and then fine-tuning it for specific purposes, you can minimize the effort required to apply it to new tasks. In fact, since switching to the Foundation model, our AI performance has improved significantly. More importantly, researchers in different domains have been able to increase synergy with each other through the foundation model.
Foundation model that overcomes the complexity of reality
Of course, there are many difficulties. First of all, it is not easy to obtain data from the physical world, and since the real world is always changing, data must be updated quickly. The speed of transition between training and deployment is also an important issue. For this reason, compared to its achievements in the digital world, foundation model research in the physical world appears to remain in its infancy globally. Fortunately, we have a head start and have robots operating in our everyday spaces all the time. We also have a huge testbed. There are also many strengths in securing and updating datasets. Thanks to this, we are working on many projects simultaneously, one of which is CROCO, a 3D foundation model for robots and digital twins.
CROCO stands for Cross-view Completion, which teaches AI to understand the real world using images from different viewpoints corresponding to the same scene. It is similar to how humans perceive 3D with both eyes. After training CROCO, you can fine-tune it and use it to make your robot more adaptable to the complex world of physics. For example, many robots connected to the cloud will not only provide good services in a specific space, but will also be able to provide good services in other spaces by adapting to environmental changes. Additionally, performance such as mobility can be stabilized even when environmental information changes or is incomplete. It can be more effectively applied to the mobility of large-scale robots, which is impossible with existing AI methods alone. Ultimately, we want robots to be able to navigate space like humans when they need to get somewhere.
It’s not just about understanding space. We can also expect big changes in terms of our interactions with people. For robots, humans are objects of very complex understanding. If robots can understand human behavior and intentions in a variety of situations, it will be possible to provide robot services that allow safe interaction. This research is very important for popularizing robot services.
From one answer to a bigger future
We are solving many problems to help robots perform better in the everyday physical world. Now, once you solve one problem through the foundation model, you can now transfer the knowledge to solve a new problem rather than stopping there. What does the future hold for these chain reactions? This methodology will allow 1,000 robots to perform 1,000 different tasks in the future. In the complex real world.
Our technology moves out of the lab and into everyday life. We will continue to work hard to ensure that everyone can use robots in their daily lives. The foundation model is picking up the pace.
This article was first published on the NAVER LABS website. The Forward Thinking series is published irregularly online, containing the knowledge and topics of outstanding researchers working with NAVER LABS, focusing on major technological trends of this era, such as AI, robots, autonomous driving, and digital twins. www.naverlabs.com/forwardthinking
NAVER LABS Europe 6-8 chemin de Maupertuis 38240 Meylan France Contact
To make robots autonomous in real-world everyday spaces, they should be able to learn from their interactions within these spaces, how to best execute tasks specified by non-expert users in a safe and reliable way. To do so requires sequential decision-making skills that combine machine learning, adaptive planning and control in uncertain environments as well as solving hard combinatorial optimization problems. Our research combines expertise in reinforcement learning, computer vision, robotic control, sim2real transfer, large multimodal foundation models and neural combinatorial optimization to build AI-based architectures and algorithms to improve robot autonomy and robustness when completing everyday complex tasks in constantly changing environments. More details on our research can be found in the Explore section below.
For a robot to be useful it must be able to represent its knowledge of the world, share what it learns and interact with other agents, in particular humans. Our research combines expertise in human-robot interaction, natural language processing, speech, information retrieval, data management and low code/no code programming to build AI components that will help next-generation robots perform complex real-world tasks. These components will help robots interact safely with humans and their physical environment, other robots and systems, represent and update their world knowledge and share it with the rest of the fleet. More details on our research can be found in the Explore section below.
Visual perception is a necessary part of any intelligent system that is meant to interact with the world. Robots need to perceive the structure, the objects, and people in their environment to better understand the world and perform the tasks they are assigned. Our research combines expertise in visual representation learning, self-supervised learning and human behaviour understanding to build AI components that help robots understand and navigate in their 3D environment, detect and interact with surrounding objects and people and continuously adapt themselves when deployed in new environments. More details on our research can be found in the Explore section below.
Details on the gender equality index score 2024 (related to year 2023) for NAVER France of 87/100.
The NAVER France targets set in 2022 (Indicator n°1: +2 points in 2024 and Indicator n°4: +5 points in 2025) have been achieved.
—————
Index NAVER France de l’égalité professionnelle entre les femmes et les hommes pour l’année 2024 au titre des données 2023 : 87/100
Détail des indicateurs :
Les objectifs de progression de l’Index définis en 2022 (Indicateur n°1 : +2 points en 2024 et Indicateur n°4 : +5 points en 2025) ont été atteints.
Details on the gender equality index score 2024 (related to year 2023) for NAVER France of 87/100.
1. Difference in female/male salary: 34/40 points
2. Difference in salary increases female/male: 35/35 points
3. Salary increases upon return from maternity leave: Non calculable
4. Number of employees in under-represented gender in 10 highest salaries: 5/10 points
The NAVER France targets set in 2022 (Indicator n°1: +2 points in 2024 and Indicator n°4: +5 points in 2025) have been achieved.
——————-
Index NAVER France de l’égalité professionnelle entre les femmes et les hommes pour l’année 2024 au titre des données 2023 : 87/100
Détail des indicateurs :
1. Les écarts de salaire entre les femmes et les hommes: 34 sur 40 points
2. Les écarts des augmentations individuelles entre les femmes et les hommes : 35 sur 35 points
3. Toutes les salariées augmentées revenant de congé maternité : Incalculable
4. Le nombre de salarié du sexe sous-représenté parmi les 10 plus hautes rémunérations : 5 sur 10 points
Les objectifs de progression de l’Index définis en 2022 (Indicateur n°1 : +2 points en 2024 et Indicateur n°4 : +5 points en 2025) ont été atteints.
To make robots autonomous in real-world everyday spaces, they should be able to learn from their interactions within these spaces, how to best execute tasks specified by non-expert users in a safe and reliable way. To do so requires sequential decision-making skills that combine machine learning, adaptive planning and control in uncertain environments as well as solving hard combinatorial optimisation problems. Our research combines expertise in reinforcement learning, computer vision, robotic control, sim2real transfer, large multimodal foundation models and neural combinatorial optimisation to build AI-based architectures and algorithms to improve robot autonomy and robustness when completing everyday complex tasks in constantly changing environments.
The research we conduct on expressive visual representations is applicable to visual search, object detection, image classification and the automatic extraction of 3D human poses and shapes that can be used for human behavior understanding and prediction, human-robot interaction or even avatar animation. We also extract 3D information from images that can be used for intelligent robot navigation, augmented reality and the 3D reconstruction of objects, buildings or even entire cities.
Our work covers the spectrum from unsupervised to supervised approaches, and from very deep architectures to very compact ones. We’re excited about the promise of big data to bring big performance gains to our algorithms but also passionate about the challenge of working in data-scarce and low-power scenarios.
Furthermore, we believe that a modern computer vision system needs to be able to continuously adapt itself to its environment and to improve itself via lifelong learning. Our driving goal is to use our research to deliver embodied intelligence to our users in robotics, autonomous driving, via phone cameras and any other visual means to reach people wherever they may be.
This web site uses cookies for the site search, to display videos and for aggregate site analytics.
Learn more about these cookies in our privacy notice.
You may choose which kind of cookies you allow when visiting this website. Click on "Save cookie settings" to apply your choice.
FunctionalThis website uses functional cookies which are required for the search function to work and to apply for jobs and internships.
AnalyticalOur website uses analytical cookies to make it possible to analyse our website and optimize its usability.
Social mediaOur website places social media cookies to show YouTube and Vimeo videos. Cookies placed by these sites may track your personal data.
This content is currently blocked. To view the content please either 'Accept social media cookies' or 'Accept all cookies'.
For more information on cookies see our privacy notice.