Leveraging Image, Text and Cross-media Similarities for Diversity-focused Multimedia Retrieval

Published by NAVER LABS Europe at 7 April 2013

Julien Ah-Pine, Stéphane Clinchant, Gabriela Csurka, Florent Perronnin, Jean-Michel Renders

Book chapter, ImageCLEF, Experimental Evaluation in Visual Information Retrieval, Eds: H. MÃ¼ller, P. Clough, T. Deselaers, B. Caputo, Springer Book, ISBN 978-3-642-15180-4. Full paper available on Springer Website

Careers home

This chapter summarizes the different cross-modal information retrieval techniques Xerox Research Centre implemented during three years of participation to ImageCLEF Photo tasks. The main challenge remained constant: how to optimally couple visual and textual similarities, when they capture things at different semantic levels and when one of the media (the textual one) gives, most of the time, much better retrieval performance. Some core components turned out to be very effective all over the years: the visual similarity metrics based on Fisher Vector representation of images and the cross-media similarity principle based on relevance models. Still, other components were introduced to solve additional issues: We tried different query- and document-enrichment methods by exploiting auxiliary resources such as Flickr or open-source thesauri or by doing some statistical semantic smoothing. We also implemented some clustering mechanisms in order to promote diversity in the top results and to provide faster access to relevant information. This chapter describes, analyses and assesses each of these components, namely: the mono-modal similarity measures, the different cross-media similarities, the query and document enrichment, and finally the mechanisms to ensure diversity in what is proposed to the user. To conclude, we discuss the numerous lessons we have learnt over the years by trying to solve this very challenging task.

INTERACTION

Equip robots to interact safely with humans, other robots and systems.

VISION

Perception to help robots understand and interact with the environment.

ACTION

Providing embodied agents with sequential decision-making capabilities to safely execute complex tasks in dynamic environments.

NAVER FRANCE Gender Equality 2026

All

Publications

Blog

News

Code & Data

Careers

People

Leveraging Image, Text and Cross-media Similarities for Diversity-focused Multimedia Retrieval

All

Publications

Blog

News

Code & Data

Careers

People

Cookie settings