Visual Localization

Visual localization is an important component of many location-based systems such as self-driving cars, autonomous robots, or augmented, mixed and virtual reality. The goal is to estimate the accurate position and orientation of a camera from captured images. In more detail, correspondences between a representation of the environment (3D map) and a query image are utilized to estimate the camera pose.

The methods used to perform visual localization have to deal with the difficulties of environmental changes i.e. differences in the time of day or season of the year, structural changes such as those made on the outside of buildings or store fronts, moving objects that occlude parts of the scene (cars, people…) as well as large changes in viewpoint between the mapping and query images. To help with these issues we’ve made available several datasets such as the Virtual Gallery dataset and Large-scale localization dataset in crowded indoor spaces.

NAVER LABS large-scale localization dataset.

Methods

Structure-based visual localization methods use local features to establish correspondences between 2D query images and 3D reconstructions such as our work on R2D2 (Repeatable and Reliable Detector and Descriptor) and PUMP (pyramidal and uniqueness matching priors for unsupervised learning of local features). These correspondences are then used to compute the camera pose using perspective-n-point (PNP) solvers within a RANSAC loop. Such correspondences can also be provided by our foundation model for 3D Vision, DUSt3R.

To reduce the search range in large 3D reconstructions, image retrieval methods such as Deep Image Retrieval, Super Features or Weatherproofing can be used to first retrieve most relevant images from the Structure from Motion (SFM) model. Second, local correspondences are established in the area defined by those images.

Scene point regression methods like SACReg establish the 2D-3D correspondences using a deep neural network (DNN) and absolute pose regression methods directly estimate the camera pose with a DNN.

Another strategy is to estimate the pose of an image by directly aligning deep images features with a reference 3D model which is what we have done with SegLoc and NERF for Camera Pose Refinement.

Image retrieval can also be used for visual localization when no 3D map is available. The camera pose of a query image can be computed from the poses of top retrieved database images by interpolating retrieved image poses, estimating the relative pose between query and retrieved images or by building local 3D models on the fly (see Image Retrieval Benchmark).

Furthermore, objects or semantic information can also be used for visual localization as detailed in our work on Objects of Interest (OOI).

To help the visual localization community advance further, we released Kapture, a data format and toolbox to make it easier to use public datasets, create maps and re-localize images.

Visual Localization

Methods

NAVER FRANCE Gender Equality 2024

All

Publications

Blog

News

Code & Data

Careers

People

ACTION

Providing embodied agents with sequential decision-making capabilities to safely execute complex tasks in dynamic environments.

INTERACTION

Equip robots to interact safely with humans, other robots and systems.

VISION

Perception to help robots understand and interact with the environment.

NAVER FRANCE Gender Equality 2023

Action

Visual Localization

Methods

All

Publications

Blog

News

Code & Data

Careers

People

Cookie settings