R2D2: Repeatable and Reliable Detector and Descriptor - Naver Labs Europe

Combining keypoint reliability in an image as part of the keypoint detection problem significantly improves feature matching.

To reconstruct objects in 3D, detect objects in a database of images, localize images in a given environment or other computer vision applications, you need to be able to detect and describe keypoints.

For handcrafted keypoints, a human tells the computer what a keypoint should look like. They’re typically edges and corners or other so called ‘salient regions’. For learning-based keypoints, the detector is designed in a way that the computer can decide by itself which regions of the image contain ‘interesting keypoints’ for the target application. A detailed description about the evolution from handcrafted to learning-based keypoints can be found here.

But, between ‘salient regions’ and ‘interesting keypoints’ the most important factor is defining what a good keypoint is.

Traditionally it’s been defined as being ‘repeatable’ which means it can be detected in multiple images even if subject to domain specific transformations such as viewpoint changes. However, in the applications mentioned above, ‘repeatable’ detection of keypoints isn’t enough because we also need to find ‘reliable’ keypoint correspondences across multiple images. In other words, we need to find reliable ‘keypoint matches’ where a match is reliable if the similarity between the best candidate and all other ones is low i.e. its appearance is unique in the image. This requirement doesn’t hold for areas in an image that lack texture such as a blue sky or repetitive areas such as windows on the façade of a building where extracted patches would look very similar to others, hence making them unreliable for matching.


The reliability problem has been the subject of extensive study in the field of correspondence analysis but, up until now, it’s been completely ignored during keypoint extraction.

In our work, we propose treating reliability as part of the detection problem and formulate it with a modified ranking loss. We show that this approach leads to outstanding results, beyond state-of-the-art, on classic feature matching benchmarks (HPatches) as well as on challenging visual localization datasets. The latter especially highlights the ability of our data-driven approach to cope with very difficult scenarios such as matching day-time with night-time images.
All the details are in the paper and we’d be happy to discuss it at NeurIPS 2019 during the oral and poster sessions as well as on the NAVER booth from Sunday, December 8th to Wednesday, December 11th.

NeurIPS 2019:

Code and models: https://github.com/naver/r2d2

This blog was first published on the 4th December 2019.

Related Content