Learning Visual Representations with Caption Annotations
News:
- Our paper has been accepted at the European Conference on Computer Vision (ECCV) 2020, 23-28 August 2020: Learning Visual Representations with Caption Annotations
- We have released the weights of the ICMLM models pretrained on MS-COCO.
- Our demo, where you can visualize the attention maps of our ICMLM model for a set of MS-COCO (image, caption, maskable token) triplets, is live!
Pretrained models:
Model | Backbone | Dataset | Label Set | Pretrained Model File |
---|---|---|---|---|
ICMLMatt-fc | ResNet50 | MS-COCO | 5K Nouns + Adjectives + Verbs | icmlm-attfc_r50_coco_5K.pth |
ICMLMtfm | ResNet50 | MS-COCO | 5K Nouns + Adjectives + Verbs | icmlm-tfm_r50_coco_5K.pth |
Qualitative results:
BibTeX:
@InProceedings{sariyildiz2020icmlm, author = {Sariyildiz, Mert Bulent and Perez, Julien and Larlus, Diane}, title = {Learning Visual Representations with Caption Annotations}, booktitle = {European Conference on Computer Vision (ECCV)}, year = {2020} }