Weatherproofing retrieval for localization with generative AI and geometric consistency

Published by Yannis Kalantidis at 7 May 2024

Yannis Kalantidis, Mert Bulent Sariyildiz, Rafael Sampaio De Rezende, Philippe Weinzaepfel, Diane Larlus, Gabriela Csurka

The International Conference on Learning Representations (ICLR), Vienna, Austria, 7-11 May, 2024

Paper Results Models arXiv

Careers home

State-of-the-art visual localization approaches generally rely on a first image retrieval step whose role is crucial. Yet, retrieval often struggles when facing varying conditions, due to e.g. weather or time of day, with dramatic consequences on the visual localization accuracy. In this paper, we improve this retrieval step and tailor it to the final localization task. Among the several changes we advocate for, we propose to synthesize variants of the training set images, obtained from generative text-to-image models, in order to automatically expand the training set towards a number of nameable variations that particularly hurt visual localization. These changes result in Ret4Loc, a training approach that learns from such synthetic variants together with real images and that leverages geometric consistency for filtering and sampling. Experiments show that it leads to large improvements on multiple challenging visual localization and place recognition benchmarks.

Relative gains in localization accuracy

Compared to the state of the art (black dot), using our Ret4Loc models trained with real (Ret4Loc) or real+synthetic images (Ret4Loc+Synth).

Synthetic variants

For the training set images shown on the left.

Geometric correspondences

Computed before (top) and after (bottom) replacing the left image with one of their synthetic variants

Ret4Loc in three figures: (left) overview of our experimental validation, (middle) synthetic variants used to extend the training set, and (right) geometric consistency used to select those variants. More precisely: (Left) Relative gains in localization accuracy compared to the state of the art (black dot) for 7 outdoor and 1 indoor datasets, achieved by our best retrieval models trained with our method using only real images (Ret4Loc) or real and synthetic images (Ret4Loc + Synth). Axes in log-scale. (Middle) Original images and several of their synthetic variants obtained for different prompts. (Right) Estimated local correspondences between two matching images before and after alteration. This is used to discard from the training set synthetic variants that fail that verification process.

Visual Localization Results

Figure-2: Localization accuracy as a function of the top-k retrieved images for Ret4Loc models and the state of the art. Results shown for two protocols: A Pose approximation protocol (EWB) and a Structure-from-Motion (SfM) based protocol. Ret4Loc-HOW-Synth variations using geometric consistency are denoted with a ”+”.

Place Recognition Results

Figure-3: Visual place recognition results. We report the usual metrics (top-k recall), i.e. if one correct image is retrieved in the top k. ∗ denotes results from GCL (Leyva-Vallina et al., 2023);

Pretrained Models

Here you can find links to Ret4Loc pretrained models. We built our codebase on top of the HOW codebase. You can use code from HOW to load and evaluate the Ret4Loc models.

We provide two model weights (33MB each):

ret4loc_how.pth – the baseline Ret4Loc-HOW model
ret4loc_how_synth-pp.pth – our best Ret4Loc model trained with synthetic data and geometric verification.

You can load our models exactly like a HOW model.

@inproceedings{DBLP:conf/iclr/KalantidisSRWLC24,
  author       = {Yannis Kalantidis and
                  Mert B{\”{u}}lent Sariyildiz and
                  Rafael S. Rezende and
                  Philippe Weinzaepfel and
                  Diane Larlus and
                  Gabriela Csurka},
  title        = {Weatherproofing Retrieval for Localization with Generative {AI} and
                  Geometric Consistency},
  booktitle    = {The Twelfth International Conference on Learning Representations,
                  {ICLR} 2024, Vienna, Austria, May 7-11, 2024},
  publisher    = {OpenReview.net},
  year         = {2024},
  url          = {https://openreview.net/forum?id=5EniAcsO7f},
  timestamp    = {Wed, 07 Aug 2024 17:11:53 +0200},
  biburl       = {https://dblp.org/rec/conf/iclr/KalantidisSRWLC24.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}

Relative gains in localization accuracy

Synthetic variants

Geometric correspondences

Visual Localization Results

Place Recognition Results

Pretrained Models

INTERACTION

Equip robots to interact safely with humans, other robots and systems.

VISION

Perception to help robots understand and interact with the environment.

ACTION

Providing embodied agents with sequential decision-making capabilities to safely execute complex tasks in dynamic environments.

NAVER FRANCE Gender Equality 2025

All

Publications

Blog

News

Code & Data

Careers

People

Weatherproofing retrieval for localization with generative AI and geometric consistency

Relative gains in localization accuracy

Synthetic variants

Geometric correspondences

Visual Localization Results

Place Recognition Results

Pretrained Models

All

Publications

Blog

News

Code & Data

Careers

People

Cookie settings