In this document, we first recall briefly our baseline methods both for text and image retrieval and describe our information fusion strategy, before giving specific details concerning our submitted runs.
As text retrieval, XRCE used either and Information-Based IR model  or a Lexical Entailment IR model based on statistical translation IR model . Alternatively, we also used an approach from CEA List that models the queries using on one hand socially related Flickr tags and on the other hand Wikipedia concepts introduced in . The combination of these runs have shown that the approaches were rather complementary.
As image representation, we used spatial pyramid of Fisher Vectors built on local orientation histograms and local RGB statistics. The dot product was used to define the similarity between two images and to combine the color and texture based ranking we used simple score averaging.
Finally, to combine visual and textual information, we used a so called the Late Semantic Combination (LSC) method , where first the text expert is used to retrieved semantically relevant documents, and than the visual and textual scores are averaged to rank these documents. This strategy allowed us to significantly improve over mono-modal retrieval performances. Using the late fusion of the best text expert from XRCE and from CEA and combining with our Fisher Vector based image run with LSC leaded to a MAP of 37% (best score obtained in the Challenge).