Martin Humenberger |
2020 |
Visual localization, which estimates the position and orientation of a camera in a map based on query images, and structure from motion (SFM), which is one of the most popular ways to build this visual localization map, are fundamental components in the development of technologies such as autonomous robots and self-driving vehicles.
However, a major barrier to research in visual localization and SfM lies in the format of the data itself. Although many good public datasets for evaluation now exist, these datasets are all structured using different formats. As a result, data importers and exporters must often be modified and coordinate systems or camera parameters almost always must be transformed. Even more data conversion is necessary if you want to use a combination of multiple tools in a single pipeline. Many times, existing data formats don’t include the types of data needed for a specific application, especially when it comes to data such as WiFi or other non-image sensors.
To overcome this research barrier, we created the kapture format. Kapture includes a wide range of data to store and comes with several format converters and data processing tools. On the one hand, it will make public datasets easier to use and on the other hand, it will allow processed data (such as local or global features or matches) to be shared easily (again in one common format). By sharing our kapture format with the research community, our aim is to facilitate future research and development in topics such as visual localization, SfM, VSLAM and sensor fusion.
The release contains the following main features:
Given a known 3D space representation, visual localization is the problem of estimating the position and orientation of a camera using query images. Structure from Motion (SfM) is one of the most popular ways to reconstruct a 3D scene from an unordered set of images.
Kapture can be used to store many types of data collected for visual localization and SfM. Examples include:
It can also store data computed during various stages of the process, such as
Kapture also includes a set of Python tools to load, save, and convert datasets to and from kapture. General formats such as COLMAP, openmvg, OpenSfM, bundler, image_folder, image_list, and nvm, as well as a few formats specific to particular datasets such as IDL_dataset_cvpr17, RobotCar_Seasons, ROSbag cameras+trajectory, SILDa, and virtual_gallery are all supported.
Our example pipelines take you through the process of localizing query images on a map, which consists of two major parts: i) building the map and ii) localizing a query. In each pipeline, we first show you how to build the map using SfM and known poses, and then we show you how to localize query images. We also explain how to use kapture to evaluate the precision of the obtained localization against the ground truth.
We’ve already converted a number of datasets to the kapture format. For instance, kapture can be used to process all the datasets included in ECCV 2020’s Visual Localization Challenge (the Aachen Day-Night, Inloc, RobotCar Seasons, Extended CMU-Seasons, and SILDa Weather and Time of Day datasets). The images in these datasets are meant to include the many challenging scenarios that arise in real driving situations and include changes in the time of day, season of the year, and outdated reference representations. The datasets also include images with occlusion, motion blur, extreme viewpoint changes, and low-texture areas.
If you already have your SfM or visual localization processing tools up and running, you just need to integrate kapture support once, after which you can use all the datasets without any additional conversion or glue code writing. Additional details and how to obtain an updated list of datasets can be found in the kapture tutorial.
If you find kapture useful, we encourage contributions!
You are welcome to provide your own dataset in kapture format (we’re happy to help), write new data converters, report bugs and suggest improvements, provide processed data (e.g., extracted features or matches) in kapture format, and add support for additional data types.
More information, news, and updates can be found on our website.
GitHub Repository: https://github.com/naver/kapture
License: BSD 3-Clause
Paper: M Humenberger, Y Cabon, N Guerin, J Morat, J Revaud, P Rerole, N Pion, C de Souza, V Leroy, and G Csurka: Robust Image Retrieval-based Visual Localization using kapture
Yohann Cabon, Gabriela Csurka Khedari, Nicolas Guérin, Vincent Leroy, Julien Morat, Noé Pion, Philippe Rerole, Jérôme Revaud, César Roberto De Souza, Philippe Weinzaepfel
NAVER LABS Europe 6-8 chemin de Maupertuis 38240 Meylan France Contact
To make robots autonomous in real-world everyday spaces, they should be able to learn from their interactions within these spaces, how to best execute tasks specified by non-expert users in a safe and reliable way. To do so requires sequential decision-making skills that combine machine learning, adaptive planning and control in uncertain environments as well as solving hard combinatorial optimization problems. Our research combines expertise in reinforcement learning, computer vision, robotic control, sim2real transfer, large multimodal foundation models and neural combinatorial optimization to build AI-based architectures and algorithms to improve robot autonomy and robustness when completing everyday complex tasks in constantly changing environments. More details on our research can be found in the Explore section below.
For a robot to be useful it must be able to represent its knowledge of the world, share what it learns and interact with other agents, in particular humans. Our research combines expertise in human-robot interaction, natural language processing, speech, information retrieval, data management and low code/no code programming to build AI components that will help next-generation robots perform complex real-world tasks. These components will help robots interact safely with humans and their physical environment, other robots and systems, represent and update their world knowledge and share it with the rest of the fleet. More details on our research can be found in the Explore section below.
Visual perception is a necessary part of any intelligent system that is meant to interact with the world. Robots need to perceive the structure, the objects, and people in their environment to better understand the world and perform the tasks they are assigned. Our research combines expertise in visual representation learning, self-supervised learning and human behaviour understanding to build AI components that help robots understand and navigate in their 3D environment, detect and interact with surrounding objects and people and continuously adapt themselves when deployed in new environments. More details on our research can be found in the Explore section below.
Details on the gender equality index score 2024 (related to year 2023) for NAVER France of 87/100.
The NAVER France targets set in 2022 (Indicator n°1: +2 points in 2024 and Indicator n°4: +5 points in 2025) have been achieved.
—————
Index NAVER France de l’égalité professionnelle entre les femmes et les hommes pour l’année 2024 au titre des données 2023 : 87/100
Détail des indicateurs :
Les objectifs de progression de l’Index définis en 2022 (Indicateur n°1 : +2 points en 2024 et Indicateur n°4 : +5 points en 2025) ont été atteints.
Details on the gender equality index score 2024 (related to year 2023) for NAVER France of 87/100.
1. Difference in female/male salary: 34/40 points
2. Difference in salary increases female/male: 35/35 points
3. Salary increases upon return from maternity leave: Non calculable
4. Number of employees in under-represented gender in 10 highest salaries: 5/10 points
The NAVER France targets set in 2022 (Indicator n°1: +2 points in 2024 and Indicator n°4: +5 points in 2025) have been achieved.
——————-
Index NAVER France de l’égalité professionnelle entre les femmes et les hommes pour l’année 2024 au titre des données 2023 : 87/100
Détail des indicateurs :
1. Les écarts de salaire entre les femmes et les hommes: 34 sur 40 points
2. Les écarts des augmentations individuelles entre les femmes et les hommes : 35 sur 35 points
3. Toutes les salariées augmentées revenant de congé maternité : Incalculable
4. Le nombre de salarié du sexe sous-représenté parmi les 10 plus hautes rémunérations : 5 sur 10 points
Les objectifs de progression de l’Index définis en 2022 (Indicateur n°1 : +2 points en 2024 et Indicateur n°4 : +5 points en 2025) ont été atteints.
To make robots autonomous in real-world everyday spaces, they should be able to learn from their interactions within these spaces, how to best execute tasks specified by non-expert users in a safe and reliable way. To do so requires sequential decision-making skills that combine machine learning, adaptive planning and control in uncertain environments as well as solving hard combinatorial optimisation problems. Our research combines expertise in reinforcement learning, computer vision, robotic control, sim2real transfer, large multimodal foundation models and neural combinatorial optimisation to build AI-based architectures and algorithms to improve robot autonomy and robustness when completing everyday complex tasks in constantly changing environments.
The research we conduct on expressive visual representations is applicable to visual search, object detection, image classification and the automatic extraction of 3D human poses and shapes that can be used for human behavior understanding and prediction, human-robot interaction or even avatar animation. We also extract 3D information from images that can be used for intelligent robot navigation, augmented reality and the 3D reconstruction of objects, buildings or even entire cities.
Our work covers the spectrum from unsupervised to supervised approaches, and from very deep architectures to very compact ones. We’re excited about the promise of big data to bring big performance gains to our algorithms but also passionate about the challenge of working in data-scarce and low-power scenarios.
Furthermore, we believe that a modern computer vision system needs to be able to continuously adapt itself to its environment and to improve itself via lifelong learning. Our driving goal is to use our research to deliver embodied intelligence to our users in robotics, autonomous driving, via phone cameras and any other visual means to reach people wherever they may be.
This web site uses cookies for the site search, to display videos and for aggregate site analytics.
Learn more about these cookies in our privacy notice.
You may choose which kind of cookies you allow when visiting this website. Click on "Save cookie settings" to apply your choice.
FunctionalThis website uses functional cookies which are required for the search function to work and to apply for jobs and internships.
AnalyticalOur website uses analytical cookies to make it possible to analyse our website and optimize its usability.
Social mediaOur website places social media cookies to show YouTube and Vimeo videos. Cookies placed by these sites may track your personal data.
This content is currently blocked. To view the content please either 'Accept social media cookies' or 'Accept all cookies'.
For more information on cookies see our privacy notice.