This article is concerned with the detection of prominent objects in images. As opposed to the standard approaches based on sliding windows, we study a fundamentally different solution by formulating the supervised prediction of a bounding box as an image retrieval task. Indeed, given a global image descriptor, we find the most similar images in an annotated dataset, and transfer the object bounding boxes. We refer to this approach as data-driven detection (DDD). The key novelty of the work is to design or learn image similarities that explicitly optimize the accuracy of the transfer – as opposed to previous work which uses generic representations and unsupervised similarities. This is done in two senses: first, we explicitly learn to transfer, by adapting a metric learning approach to work with image and bounding box pairs. Second, we use an image representations designed to be more consistent with the objective of transferring bounding boxes: a representation of images as object probability maps computed from low-level patch classifiers. We show experimentally that these two contributions are crucial enablers of DDD as a very competitive method for promiment object detection, in some cases yielding comparable or better results than state-of-the-art detectors – despite its conceptual simplicity and efficiency at runtime. Our third contribution is an application of prominent object detection, where we improve fine-grained categorization by pre-cropping images with the proposed approach. We also discuss and evaluate experimentally an extension of the proposed approach to detect multiple parts of rigid objects.
Full paper available on IEEE Xplore digital library: http://www.ieeeexplore.ws