ARTEMIS: attention-based retrieval with text-explicit matching implicit similarity

Published by Ginger Delmas at 25 April 2022

Ginger Delmas, Rafael Sampaio De Rezende, Gabriela Csurka, Diane Larlus

Tenth International Conference on Learning Representations (ICLR), virtual event, 25-29 April, 2022

Abstract

A multi-modal query, i.e. a query composed of an example image and a companion sentence that modifies it, is a very intuitive way to search for images of a particular fashion article. Previous attempts at tackling this complex task have mostly focused on learning to compose the visual and textual descriptors of the query elements in order to directly compare the resulting representation to those of the candidate fashion target images. Our approach departs from this strategy. We proposes two simple modules which draw inspiration from cross-modal retrieval and image search, respectively. These two research domains have been extensively studied and their successes, when combined, can be used to effectively tackle our task, which lies at the intersection of both families of approaches. We validate our method on several benchmarks with free-form text modifiers and obtain substantial performance improvements on several tasks.

Find out more

All

Publications

Blog

News

Code & Data

Careers

People

NAVER FRANCE Gender Equality 2024

NAVER FRANCE Gender Equality 2023

VISION

Perception to help robots understand and interact with the environment.

INTERACTION

Equip robots to interact safely with humans, other robots and systems.

ACTION

Providing embodied agents with sequential decision-making capabilities to safely execute complex tasks in dynamic environments.

Action

ARTEMIS: attention-based retrieval with text-explicit matching implicit similarity

All

Publications

Blog

News

Code & Data

Careers

People

Cookie settings