Dates: Thursday November 28th and Friday November 29th
Location: NAVER LABS Europe, 6 Chemin de Maupertuis, 38240 Meylan (Grenoble, France)
Remarkable results in computer vision, reinforcement learning, scene understanding and related fields have led to being able to equip robots with AI components so they can operate in the real world. This is a significant shift from the highly-controlled environments they had previously been restricted to, as well as the very clearly defined tasks they were able to undertake in these environments. In the not-too-distant future, we expect that further advances in AI will integrate robots and our interaction with them, into our everyday lives.
In this workshop, we address the question of how AI can help to solve the biggest challenges of real-world robotics applications such as understanding and navigating complex dynamic environments, interacting with humans and learning to accomplish tasks autonomously. The event brings together researchers and experts from different AI and Robotics disciplines to discuss current and future directions of these fields in an informal setting to facilitate the creation of connections and collaborations.
In order to make the workshop more interactive and to give more people the opportunity to actively contribute to the workshop, we will organize a poster session for, e.g., PhD students (or anybody else) to present and discuss their work. For this, we will provide small travel grants of 300 Euro on a first come first serve basis.
NAVER LABS Europe is the biggest industrial AI research center in France and the sister lab, NAVER LABS (Korea), is a leading robotics research organization with robots like the M1X mapping robot, the service platform AROUND, and the 5G connected robot arm AMBIDEX. Both organizations are owned by NAVER Corporation, Korea’s leading internet company and 9th on Forbes 2018 list of the world’s most innovative companies.
Please note this is an invitation only event as the number of guests is restricted. FOR INVITEES: Please register here.
Horst Bischof – Graz University of Technology: – “Understanding long-term complex activities”
Christian Wolf – INSA & CNRS – “Integrating Learning and Projective Geometry for Robotics”
Alexandre Alahi – EPFL – “Socially-aware AI for Last-mile Mobility”
Sangbae Kim – MIT – “Robots with Physical Intelligence”
Daniel Cremers – Technische Universität München – “Direct Visual SLAM for Autonomous Systems”
Marc Pollefeys – ETH Zürich
Cordelia Schmid – INRIA & Google – “Learning to combine primitive skills: A versatile approach to robotic manipulation”
Sangok Seok – NAVER LABS – “New Connections Between People, Spaces and Information: Robotics, Autonomous driving, AI and 5G”
Torsten Sattler – Chalmers University of Technology
Radu Horaud – INRIA – “Audio-visual machine perception for socially interacting robots”
Vincent Lepetit – EPFL
Horst Bischof – “Understanding long-term complex activities”
Graz University of Technology
Abstract: Understanding complex human activities is a requirement for efficient human-robot interaction as well as several other tasks in a production environment. This talk will highlight challenges that arise when analyzing complex human activities (eg. assembly tasks) with a computer vision system. We will demonstrate our recent work in that area and describe some of the major research challenges, including training these systems with minimal supervision, representations of complex activities etc.
Christian Wolf – “Integrating Learning and Projective Geometry for Robotics”
INSA & CNRS
Abstract: In this talk we address perception and navigation problems in robotics settings, in particular mobile terrestrial robots and intelligent vehicles. We focus on learning representations, which are structured and allow to reason on a high level on the presents of objects and actors in a scene and to take planification and control decisions. Two different methods will be compared, which both structure their state as metric maps in a bird’s eye view, updated through affine transforms given ego-motion.
The first method combines Bayesian filtering and Deep Learning to fuse LIDAR input and monocular RGB input, resulting in a semantic occupancy grid centered on a vehicle. A deep network is trained to segment RGB input and to fuse it with Bayesian occupancy grids. The second method automatically learning robot navigation in 3D environments from interactions and reward using Deep Reinforcement Learning. Similar to the first mode, it keeps a metric map of the environment in a bird’s eye view, which is dynamically updated. Different from the first method, the semantic meaning of the map’s content is not determined before hand or learned from supervision. Instead, projective geometry is used as an inductive bias in deep neural networks. The content of the metric map is learned from interactions and reward, allowing the agent to discover regularities and object affordances from the task itself.
We also present a new benchmark and a suite of tasks requiring complex reasoning and exploration in continuous, partially observable 3D environments. The objective is to provide challenging scenarios and a robust baseline agent architecture that can be trained on mid-range consumer hardware in under 24h. Solving our tasks requires substantially more complex reasoning capabilities than standard benchmarks available for this kind of environments.
Alexandre Alahi – “Socially-aware AI for Last-mile Mobility”
Abstract: Artificial Intelligence (AI) is poised to reshape the future of mobility with autonomous moving platforms tackling the “last mile” problem. Integration of these AI-driven systems into our society remains a grand challenge: they not only need to perform transportation tasks, but also to do so in close proximity with humans in the open world. AI implementations are not safe and do not convey trust in the population due to unacceptable fatal accidents involving self-driving cars. Delivery/social robots face similar issues, either freezing in crowded scenes or recklessly forcing humans to move away. To address these challenges, AI must go beyond classification tasks and develop broader cognition: learn and obey unwritten common sense rules and comply with social conventions in order to gain human trust. Robots should respect personal space, yield right-of-way, and ultimately “read” the behavior of others to effectively navigate crowded spaces. I will present a new type of cognition I call socially-aware AI to address these challenges.
Sangbae Kim – “Robots with Physical Intelligence”
Abstract: While industrial robots are effective in repetitive, precise kinematic tasks in factories, the design and control of these robots are not suited for physically interactive performance that humans do easily. These tasks require ‘physical intelligence’ through complex dynamic interactions with environments whereas conventional robots are designed primarily for position control. In order to develop a robot with ‘physical intelligence’, we first need a new type of machines that allows dynamic interactions. This talk will discuss how the new design paradigm allows dynamic interactive tasks. As an embodiment of such a robot design paradigm, the latest version of the MIT Cheetah robots and force-feedback teleoperation arms will be presented. These robots are equipped with proprioceptive actuators, a new design paradigm for dynamic robots. This new class of actuators will play a crucial role in developing ‘physical intelligence’ and future robot applications such as elderly care, home service, delivery, and services in environments unfavorable for humans.
Daniel Cremers – Direct Visual SLAM for Autonomous Systems
Technische Universität München
Cordelia Schmid – “Learning to combine primitive skills: A versatile approach to robotic manipulation”
INRIA & Google
Abstract: Manipulation tasks such as preparing a meal or assembling furniture remain highly challenging for robotics and vision. Traditional task and motion planning (TAMP) methods can solve complex tasks but require full state observability and are not adapted to dynamic scene changes. Recent learning methods can operate directly on visual inputs but typically require many demonstrations and/or task-specific reward engineering. In this work we aim to overcome previous limitations and propose a reinforcement learning (RL) approach to task planning that learns to combine primitive skills. First, compared to previous learning methods, our approach requires neither intermediate rewards nor complete task demonstrations during training. Second, we demonstrate the versatility of our vision-based task planning in challenging settings with temporary occlusions and dynamic scene changes. Third, we propose an efficient training of basic skills from few synthetic demonstrations by exploring recent CNN architectures and data augmentation. Notably, while all of our policies are learned on visual inputs in simulated environments, we demonstrate the successful transfer and high success rates when applying such policies to manipulation tasks on a real UR5 robotic arm. This is joint work with R. Strudel, A. Pashevich, I. Kalevatykh, I. Laptev, J. Sivic.
Sangok Seok – “New Connections Between People, Spaces and Information: Robotics, Autonomous driving, AI and 5G”
Abstract: The goal of this talk is to introduce a variety of the latest achievements of NAVER LABS, NAVER’s R&D corporation, as well as the direction in which NAVER LABS is heading. Firstly, NAVER, which has been specializing in online platforms, is pursuing the core technologies in the fields of robotics, autonomous driving and artificial intelligence in order to naturally apply such services to offline platforms (physical spaces) in our daily lives. Thus, NAVER LABS is focusing on the future technologies that can provide people with information and services more closely. In particular, precise spatial data (machine-readable 3D/HD maps) is imperative for robots and autonomous driving machines to interact with people and provide services. NAVER LABS has developed a novel in-house solution to create 3D high-precision maps of various environments, including indoor spaces, outdoor spaces, and even roads, through technologies such as mapping robots, mobile mapping systems (MMS) and aerial photogrammetry technologies. Such technologies generate essential data that allow for the safe and precise autonomous driving of robots and vehicles. They are also being used as the core data for visual localization (VL) technology which can accurately recognize locations with just a single photograph where GPS signals are not available, such as indoor environments and skyscraper-dense areas. Based on these high-precision spatial data and localization technologies, NAVER LABS is building an expandable mobile platform by incorporating cloud computing, computer vision-based deep learning, advanced driver-assistance systems (ADAS) and human-robot interaction (HRI). Through continuous innovation, NAVER LABS has demonstrated 5G brainless robot technologies, 3D augmented reality head-up displays (AR HUDs) and indoor AR navigation systems. NAVER LABS is constantly striving to offer the technologies of the future to the world. The new connections between people, spaces and information is the vision of NAVER LABS, and the future of our lives.
Radu Horaud – “Audio-visual machine perception for socially interacting robots”
In this talk I will give an overview of the research carried out by the Perception team at Inria Grenoble for the past five years. I will start by stating the scientific challenges of socially interactive robots, as opposed to the commercially available interactive devices that are widely available today and that are essentially based on speech technologies. I will discuss the difficulties of multiple users interacting with a robot, as opposed to a single user. I will emphasize the complementary role of visual and audio perception and I will address in detail the problems associated with fusing these two modalities in unrestricted settings.