This year, our participation to ImageCLEF 2008 (Photo Retrieval Sub-task) was motivated by trying to address three different problems: visual concept detection and its exploitation in a retrieval context, multimedia fusion methods for improved retrieval performance and diversity-based re-ranking methods. From a purely visual perspective, the representation based on Fisher vectors derived from a generative mixture model appeared to be efficient for both visual concept detection and content-based image retrieval. From a multimedia perspective, we used an intermediate fusion approach, based on cross-media relevance feedback that can be seen as a multigraph-based query regularization method with alternating steps. Finally, as one of main goals of the organizers was to promote both relevance and diversity in the retrieval outputs, we designed and assessed several re-ranking strategies that turned out to preserve standard retrieval performance (such at precision at 20 or mean average precision) while significantly decreasing the redundancy in the top documents. These re-ranking strategies were designed either as variant of the well-known maximal marginal relevance principle, or based on an explicit clustering algorithm.
The main lessons drawn from our participation to ImageCLEF-Photo were:
– in the case of pure text-based retrieval, both document and query enrichments by thesaurus improve the results, and combining the former with query expansion using pseudo-relevance feedback improves further the results;
– Fisher Vectors are rich image signatures and have state-of-the-art performance both in visual concept detection and content based image retrieval;
– the use of the visual concepts increases the retrieval performance when combined with pure text, but this advantage is lost when we use other, more complex multi-media fusion mechanisms, based on lower-level features than the visual concepts;
– combining the two mono-media information sources (image and text) using trans-media pseudo-relevance feedback improves significantly (by more than 50% relative) the retrieval results;
– concerning the diversity, most strategies that we proposed succeeded in reducing the redundancy in the top documents. As none of the techniques used explicitly the provided clustering criterion (e.g. diversifying according to cities or states or sports, etc.), the CR20 score was not always significantly increased (or in a few cases it was even decreased). This is not surprising, as we were seeking and improving the diversity in a blind (unsupervised) way.