SACReg: Scene-Agnostic Coordinate Regression for visual localization

Published by Jérome Revaud at 17 June 2024

Jérome Revaud, Yohann Cabon, Romain Brégier, JongMin Lee, Philippe Weinzaepfel

2'nd Workshop for Learning 3D with Multi-View Supervision (3DMV) at the IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR), Seattle, USA, 17 June, 2024

Paper CVF Open Access

Careers home

Abstract

Scene coordinates regression (SCR), i.e., predicting 3D coordinates for every pixel of a given image, has recently shown promising potential. However, existing methods remain mostly scene-specific or limited to small scenes and thus hardly scale to realistic datasets. In this paper, we propose a new paradigm where a single generic SCR model is trained once to be then deployed to new test scenes, regardless of their scale and without further finetuning. For a given query image, it collects inputs from off-the-shelf image retrieval techniques and Structure-from-Motion databases: a list of relevant database images with sparse pointwise 2D-3D annotations. The model is based on the transformer architecture and can take a variable number of images and sparse 2D-3D annotations as input. It is trained on a few diverse datasets and significantly outperforms other scene regression approaches on several benchmarks, including scene-specific models, for visual localization. In particular, we set a new state of the art on the Cambridge localization benchmark, even outperforming feature-matching-based approaches.

Method overview

Given a query image and a set of related views with sparse 2D/3D annotations retrieved from a database, SACReg predicts absolute 3D coordinates for each pixel of the query image. This can be used for visual localization using a robust PnP algorithm. Importantly, SACReg is scene-agnostic: it does not need any retraining for new datasets, only the images and 2D-3D annotations that serve as input are scene-specific.

Regression examples

Below are regression examples on Aachen-Day, a dataset on which SACReg has not been trained. Our model predicts a dense 3D coordinates point map and a confidence map for a given query image using reference images retrieved from a SfM database. Only the first 3 reference images (out of 8) are depicted. 3D coordinates and confidence are colorized and low-confidence areas are not displayed, for visualization purposes.

3D reconstruction

The 3D coordinates predicted by SACReg can be used to lift a query image into a dense 3D colored point-cloud. Because SACReg regress directly regress scene coordinates, point clouds corresponding to different query images can easily be merged into a single large scale 3D reconstruction. We illustrate it in this video, where we collected point clouds predicted for each query image from the Aachen-Day dataset, removed low-confidence 3D points, and simply concatenated all 3D point clouds together to achieve a 3D reconstruction of Aachen.

@misc{sacreg,
title={{SACReg: Scene-Agnostic Coordinate Regression for Visual Localization}},
author={{Revaud, J\’er\^ome and Cabon, Yohann and Br\’egier, Romain and Lee, JongMin and Weinzaepfel, Philippe}},
year={2023},
eprint={2307.11702},
archivePrefix={arXiv}
}

Abstract

Method overview

Regression examples

3D reconstruction

NAVER FRANCE Gender Equality 2024

All

Publications

Blog

News

Code & Data

Careers

People

ACTION

Providing embodied agents with sequential decision-making capabilities to safely execute complex tasks in dynamic environments.

INTERACTION

Equip robots to interact safely with humans, other robots and systems.

VISION

Perception to help robots understand and interact with the environment.

NAVER FRANCE Gender Equality 2023

Action

SACReg: Scene-Agnostic Coordinate Regression for visual localization

Abstract

Method overview

Regression examples

3D reconstruction

All

Publications

Blog

News

Code & Data

Careers

People

Cookie settings