Topics

From handcrafted to deep local features for computer vision applications

Published by Gabriela Csurka at 28 February 2019

Gabriela Csurka Khedari, Christopher Dance, Martin Humenberger

An article we recently published on arXiv gives an overview of the evolution of local features – from handcrafted to deep-learning-based methods – and a discussion of several benchmarks and papers that evaluate local features. We also provide references to most of the relevant literature and, whenever possible, link to code and data that are available to the community.

This blog is a digest of the article summarised in 3 figures that follows

Chronological overview of the methodologies and example methods considered
A table summarizing the different aspects considered in the paper
A sample of what we found that improves results across models, matching, features

What often improves matching results is to post-process descriptors by whitening, power-law normalisation, and L2-normalisation.
Deep models TFeat, L2-Net and HardNet can be improved by (kernel) subspace pooling (SP, KSP) or bilinear pooling, as well as by adding a global loss (TGLoss) or global orthogonal regularization (GOR).
Optimising the average precision (DOAP) instead of using a pairwise or triplet loss yields, improves the local patch verification, matching and retrieval results.
In addition to the advantages of low memory footprint and matching time, deep-learned binary features, such as binary DOAP, provide competitive results compared to recent benchmarks.
Even though learning approaches have advanced to the extent that they now attain the highest mean average precision on matching, recent benchmarks that target their application in image-based reconstruction and localisation pipelines suggest that handcrafted features such as SIFT still perform just as well or even better than recent deep-learned features on such tasks.

We hope this, largely chronologically-ordered presentation, will help better understand the topic of local feature extraction and description so as to make best use of it in modern computer vision applications.

Gabriela Csurka is a senior scientist in the Computer Vision research group. Martin Humenbenger leads the 3D Vision research group and Christopher R. Dance is a research fellow at NAVER LABS Europe. Full paper: From handcrafted to deep local invariant features

From handcrafted to deep local features for computer vision applications

Chronological overview of the methodologies and example methods considered

A table summarizing the different aspects considered in the paper

A sample of what we found that improves results across models, matching, features

INTERACTION

Equip robots to interact safely with humans, other robots and systems.

VISION

Perception to help robots understand and interact with the environment.

ACTION

Providing embodied agents with sequential decision-making capabilities to safely execute complex tasks in dynamic environments.

NAVER FRANCE Gender Equality 2026

All

Publications

Blog

News

Code & Data

Careers

People

Topics

From handcrafted to deep local features for computer vision applications

Chronological overview of the methodologies and example methods considered

A table summarizing the different aspects considered in the paper

A sample of what we found that improves results across models, matching, features

All

Publications

Blog

News

Code & Data

Careers

People

Cookie settings