An article we recently published on arXiv gives an overview of the evolution of local features – from handcrafted to deep-learning-based methods – and a discussion of several benchmarks and papers that evaluate local features. We also provide references to most of the relevant literature and, whenever possible, link to code and data that are available to the community.
This blog is a digest of the article summarised in 3 figures that follows
-
Chronological overview of the methodologies and example methods considered
-
A table summarizing the different aspects considered in the paper
-
A sample of what we found that improves results across models, matching, features
- What often improves matching results is to post-process descriptors by whitening, power-law normalisation, and L2-normalisation.
- Deep models TFeat, L2-Net and HardNet can be improved by (kernel) subspace pooling (SP, KSP) or bilinear pooling, as well as by adding a global loss (TGLoss) or global orthogonal regularization (GOR).
- Optimising the average precision (DOAP) instead of using a pairwise or triplet loss yields, improves the local patch verification, matching and retrieval results.
- In addition to the advantages of low memory footprint and matching time, deep-learned binary features, such as binary DOAP, provide competitive results compared to recent benchmarks.
- Even though learning approaches have advanced to the extent that they now attain the highest mean average precision on matching, recent benchmarks that target their application in image-based reconstruction and localisation pipelines suggest that handcrafted features such as SIFT still perform just as well or even better than recent deep-learned features on such tasks.
We hope this, largely chronologically-ordered presentation, will help better understand the topic of local feature extraction and description so as to make best use of it in modern computer vision applications.
Gabriela Csurka is a senior scientist in the Computer Vision research group. Martin Humenbenger leads the 3D Vision research group and Christopher R. Dance is a research fellow at NAVER LABS Europe. Full paper: From handcrafted to deep local invariant features