Weatherproofing Retrieval for Localization with Generative AI & Geometric Consistency

Yannis Kalantidis, Mert Bulent Sariyildiz, Rafael S. Rezende, Philippe Weinzaepfel, Diane Larlus, Gabriela Csurka

ICLR 2024

* Equal contribution

Relative gains in localization accuracy

Compared to the state of the art (black dot), using our Ret4Loc models trained with real (Ret4Loc) or real+synthetic images (Ret4Loc+Synth).

Synthetic variants

For the training set images shown on the left.

Geometric correspondences

Computed before (top) and after (bottom) replacing the left image with one of their synthetic variants

Ret4Loc in three figures: (left) overview of our experimental validation, (middle) synthetic variants used to extend the training set, and (right) geometric consistency used to select those variants. More precisely: (Left) Relative gains in localization accuracy compared to the state of the art (black dot) for 7 outdoor and 1 indoor datasets, achieved by our best retrieval models trained with our method using only real images (Ret4Loc) or real and synthetic images (Ret4Loc + Synth). Axes in log-scale. (Middle) Original images and several of their synthetic variants obtained for different prompts. (Right) Estimated local correspondences between two matching images before and after alteration. This is used to discard from the training set synthetic variants that fail that verification process.

Summary

State-of-the-art visual localization approaches generally rely on a first image retrieval step whose role is crucial. Yet, retrieval often struggles when facing varying conditions, due to e.g. weather or time of day, with dramatic consequences on the visual localization accuracy. In this paper, we improve this retrieval step and tailor it to the final localization task. Among the several changes we advocate for, we propose to synthesize variants of the training set images, obtained from generative text-to-image models, in order to automatically expand the training set towards a number of nameable variations that particularly hurt visual localization. These changes result in Ret4Loc, a training approach that learns from such synthetic variants together with real images and that leverages geometric consistency for filtering and sampling. Experiments show that it leads to large improvements on multiple challenging visual localization and place recognition benchmarks.

Visual Localization Results

Figure-2: Localization accuracy as a function of the top-k retrieved images for Ret4Loc models and the state of the art. Results shown for two protocols: A Pose approximation protocol (EWB) and a Structure-from-Motion (SfM) based protocol. Ret4Loc-HOW-Synth variations using geometric consistency are denoted with a ”+”.

Place Recognition Results

Figure-3: Visual place recognition results. We report the usual metrics (top-k recall), i.e. if one correct image is retrieved in the top k. ∗ denotes results from GCL (Leyva-Vallina et al., 2023);

Pretrained Models

Here you can find links to Ret4Loc pretrained models. We built our codebase on top of the HOW codebase. You can use code from HOW to load and evaluate the Ret4Loc models.

We provide two model weights (33MB each):

ret4loc_how.pth – the baseline Ret4Loc-HOW model
ret4loc_how_synth-pp.pth – our best Ret4Loc model trained with synthetic data and geometric verification.

You can load our models exactly like a HOW model.

Weatherproofing Retrieval for Localization with Generative AI & Geometric Consistency

Yannis Kalantidis, Mert Bulent Sariyildiz, Rafael S. Rezende, Philippe Weinzaepfel, Diane Larlus, Gabriela Csurka

ICLR 2024

Relative gains in localization accuracy

Synthetic variants

Geometric correspondences

Summary

Visual Localization Results

Place Recognition Results

Pretrained Models

All

Publications

Blog

News

Code & Data

Careers

People

NAVER FRANCE Gender Equality 2024

NAVER FRANCE Gender Equality 2023

VISION

Perception to help robots understand and interact with the environment.

INTERACTION

Equip robots to interact safely with humans, other robots and systems.

ACTION

Providing embodied agents with sequential decision-making capabilities to safely execute complex tasks in dynamic environments.

Action

Weatherproofing Retrieval for Localization with Generative AI & Geometric Consistency

Yannis Kalantidis*, Mert Bulent Sariyildiz*, Rafael S. Rezende, Philippe Weinzaepfel, Diane Larlus, Gabriela Csurka

ICLR 2024

Relative gains in localization accuracy

Synthetic variants

Geometric correspondences

Summary

Visual Localization Results

Place Recognition Results

Pretrained Models

All

Publications

Blog

News

Code & Data

Careers

People

Cookie settings

Yannis Kalantidis, Mert Bulent Sariyildiz, Rafael S. Rezende, Philippe Weinzaepfel, Diane Larlus, Gabriela Csurka