End-to-End Learning of Deep Visual Representations for Image Retrieval

Published by NAVER LABS Europe at 29 September 2017

Albert Gordo, Jon Almazán, Jérome Revaud, Diane Larlus

International Journal of Computer Vision (IJCV), 124 (2), pp. 237-254

@article{gordo2017end,
  title={End-to-end learning of deep visual representations for image retrieval},
  author={Gordo, Albert and Almazan, Jon and Revaud, Jerome and Larlus, Diane},
  journal={International Journal of Computer Vision},
  volume={124},
  number={2},
  pages={237--254},
  year={2017},
  publisher={Springer}
}

Careers home

While deep learning has become a key ingredient in the top performing methods for many computer vision tasks, it has failed so far to bring similar improvements to instance-level image retrieval. In this article, we argue that reasons for the underwhelming results of deep methods on image retrieval are threefold: (1) noisy training data, (2) inappropriate deep architecture, and (3) suboptimal training procedure. We address all three issues. First, we leverage a large-scale but noisy landmark dataset and develop an automatic cleaning method that produces a suitable training set for deep retrieval. Second, we build on the recent R-MAC descriptor, show that it can be interpreted as a deep and differentiable architecture, and present improvements to enhance it. Last, we train this network with a siamese architecture that combines three streams with a triplet loss. At the end of the training process, the proposed architecture produces a global image representation in a single forward pass that is well suited for image retrieval. Extensive experiments show that our approach significantly outperforms previous retrieval approaches, including state-of-the-art methods based on costly local descriptor indexing and spatial verification. On Oxford 5k, Paris 6k and Holidays, we respectively report 94.7, 96.6, and 94.8 mean average precision. Our representations can also be heavily compressed using product quantization with little loss in accuracy.

INTERACTION

Equip robots to interact safely with humans, other robots and systems.

VISION

Perception to help robots understand and interact with the environment.

ACTION

Providing embodied agents with sequential decision-making capabilities to safely execute complex tasks in dynamic environments.

NAVER FRANCE Gender Equality 2025

All

Publications

Blog

News

Code & Data

Careers

People

End-to-End Learning of Deep Visual Representations for Image Retrieval

All

Publications

Blog

News

Code & Data

Careers

People

Cookie settings