Learning Visual Representations with Caption Annotations

ECCV_paper_visual-marsked_ gif


  • We have released the weights of the ICMLM models pretrained on MS-COCO.
  • Our demo, where you can visualize the attention maps of our ICMLM model for a set of MS-COCO (image, caption, maskable token) triplets, is live!

Pretrained models:

Model Backbone Dataset Label Set Pretrained Model File
ICMLMatt-fc ResNet50 MS-COCO 5K Nouns + Adjectives + Verbs icmlm-attfc_r50_coco_5K.pth
ICMLMtfm ResNet50 MS-COCO 5K Nouns + Adjectives + Verbs icmlm-tfm_r50_coco_5K.pth

Qualitative results:

Learning Visual Representations with Caption Annotations


author = {Sariyildiz, Mert Bulent and Perez, Julien and Larlus, Diane},
title = {Learning Visual Representations with Caption Annotations},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2020}

This web site uses cookies for the site search, to display videos and for aggregate site analytics.

Learn more about these cookies in our privacy notice.


Cookie settings

You may choose which kind of cookies you allow when visiting this website. Click on "Save cookie settings" to apply your choice.

FunctionalThis website uses functional cookies which are required for the search function to work and to apply for jobs and internships.

AnalyticalOur website uses analytical cookies to make it possible to analyse our website and optimize its usability.

Social mediaOur website places social media cookies to show YouTube and Vimeo videos. Cookies placed by these sites may track your personal data.