|Ginger Delmas, Rafael Sampaio De Rezende, Gabriela Csurka, Diane Larlus|
|Tenth International Conference on Learning Representations (ICLR), virtual event, 25 - 29 April, 2022|
A multi-modal query, i.e. a query composed of an example image and a companion sentence that modifies it, is a very intuitive way to search for images of a particular fashion article. Previous attempts at tackling this complex task have mostly focused on learning to compose the visual and textual descriptors of the query elements in order to directly compare the resulting representation to those of the candidate fashion target images. Our approach departs from this strategy. We proposes two simple modules which draw inspiration from cross-modal retrieval and image search, respectively. These two research domains have been extensively studied and their successes, when combined, can be used to effectively tackle our task, which lies at the intersection of both families of approaches. We validate our method on several benchmarks with free-form text modifiers and obtain substantial performance improvements on several tasks.
You may choose which kind of cookies you allow when visiting this website. Click on "Save cookie settings" to apply your choice.
FunctionalThis website uses functional cookies which are required for the search function to work and to apply for jobs and internships.
AnalyticalOur website uses analytical cookies to make it possible to analyse our website and optimize its usability.
Social mediaOur website places social media cookies to show YouTube and Vimeo videos. Cookies placed by these sites may track your personal data.
This content is currently blocked. To view the content please either 'Accept social media cookies' or 'Accept all cookies'.
For more information on cookies see our privacy notice.