Weatherproofing Retrieval for Localization with Generative AI & Geometric Consistency

Relative gains in localization accuracy

Compared to the state of the art (black dot), using our Ret4Loc models trained with real (Ret4Loc) or real+synthetic images (Ret4Loc+Synth).

Synthetic variants

For the training set images shown on the left.

Geometric correspondences

 Computed before (top) and after (bottom) replacing the left image with one of their synthetic variants

Ret4Loc in three figures: (left) overview of our experimental validation, (middle) synthetic variants used to extend the training set, and (right) geometric consistency used to select those variants. More precisely: (Left) Relative gains in localization accuracy compared to the state of the art (black dot) for 7 outdoor and 1 indoor datasets, achieved by our best retrieval models trained with our method using only real images (Ret4Loc) or real and synthetic images (Ret4Loc + Synth). Axes in log-scale. (Middle) Original images and several of their synthetic variants obtained for different prompts. (Right) Estimated local correspondences between two matching images before and after alteration. This is used to discard from the training set synthetic variants that fail that verification process.


State-of-the-art visual localization approaches generally rely on a first image retrieval step whose role is crucial. Yet, retrieval often struggles when facing varying conditions, due to e.g. weather or time of day, with dramatic consequences on the visual localization accuracy. In this paper, we improve this retrieval step and tailor it to the final localization task. Among the several changes we advocate for, we propose to synthesize variants of the training set images, obtained from generative text-to-image models, in order to automatically expand the training set towards a number of nameable variations that particularly hurt visual localization. These changes result in Ret4Loc, a training approach that learns from such synthetic variants together with real images and that leverages geometric consistency for filtering and sampling. Experiments show that it leads to large improvements on multiple challenging visual localization and place recognition benchmarks.

Visual Localization Results

Figure-2: Localization accuracy as a function of the top-k retrieved images for Ret4Loc models and the state of the art. Results shown for two protocols: A Pose approximation protocol (EWB) and a Structure-from-Motion (SfM) based protocol. Ret4Loc-HOW-Synth variations using geometric consistency are denoted with a ”+”.

Place Recognition Results

Figure-3: Visual place recognition results. We report the usual metrics (top-k recall), i.e. if one correct image is retrieved in the top k. ∗ denotes results from GCL (Leyva-Vallina et al., 2023);

Pretrained Models

Here you can find links to Ret4Loc pretrained models.  We built our codebase on top of the HOW codebase. You can use code from HOW to load and evaluate the Ret4Loc models.

We provide two model weights (33MB each):

You can load our models exactly like a HOW model.

This web site uses cookies for the site search, to display videos and for aggregate site analytics.

Learn more about these cookies in our privacy notice.


Cookie settings

You may choose which kind of cookies you allow when visiting this website. Click on "Save cookie settings" to apply your choice.

FunctionalThis website uses functional cookies which are required for the search function to work and to apply for jobs and internships.

AnalyticalOur website uses analytical cookies to make it possible to analyse our website and optimize its usability.

Social mediaOur website places social media cookies to show YouTube and Vimeo videos. Cookies placed by these sites may track your personal data.