Learning Visual Representations with Caption Annotations - Naver Labs Europe
loader image

Learning Visual Representations with Caption Annotations

ECCV_paper_visual-marsked_ gif

News:

  • We have released the weights of the ICMLM models pretrained on MS-COCO.
  • Our demo, where you can visualize the attention maps of our ICMLM model for a set of MS-COCO (image, caption, maskable token) triplets, is live!

Pretrained models:

Model Backbone Dataset Label Set Pretrained Model File
ICMLMatt-fc ResNet50 MS-COCO 5K Nouns + Adjectives + Verbs icmlm-attfc_r50_coco_5K.pth
ICMLMtfm ResNet50 MS-COCO 5K Nouns + Adjectives + Verbs icmlm-tfm_r50_coco_5K.pth

Qualitative results:

Learning Visual Representations with Caption Annotations

BibTeX:

@InProceedings{sariyildiz2020icmlm,
author = {Sariyildiz, Mert Bulent and Perez, Julien and Larlus, Diane},
title = {Learning Visual Representations with Caption Annotations},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2020}
}