Concept generalization in visual representation learning
Mert Bulent Sariyildiz1,2, Yannis Kalantidis1, Diane Larlus1, Karteek Alahari2
1 NAVER LABS Europe 2 Inria
ICCV 2021
Measuring concept generalization, i.e., the extent to which models trained on a set of (seen) visual concepts can be leveraged to recognize a new set of (unseen) concepts, is a popular way of evaluating visual representations, especially in a self-supervised learning framework. Nonetheless, the choice of unseen concepts for such an evaluation is usually made arbitrarily, and independently from the seen concepts used to train representations, thus ignoring any semantic relationships between the two. In this paper, we argue that the semantic relationships between seen and unseen concepts affect generalization performance and propose ImageNet-CoG Benchmark, a novel benchmark on the ImageNet-21K (IN-21K) dataset that enables measuring concept generalization in a principled way. Our benchmark leverages expert knowledge that comes from WordNet in order to define a sequence of unseen IN-21K concept sets that are semantically more and more distant from the ImageNet-1K (IN-1K) subset, a ubiquitous training set. This allows us to benchmark visual representations learned on IN-1K out-of-the box. We conduct a large-scale study encompassing 31 convolution and transformer-based models and show how different architectures, levels of supervision, regularization techniques and use of web data impact the concept generalization performance
Citation:
If you find our paper interesting, please consider citing us:
@InProceedings{sariyildiz2021conceptgeneralization, title={Concept Generalization in Visual Representation Learning}, author={Sariyildiz, Mert Bulent and Kalantidis, Yannis and Larlus, Diane and Alahari, Karteek}, booktitle={International Conference on Computer Vision}, year={2021} }
Benchmark results:
In the paper, we evaluate 31 state-of-the-art models on the ImageNet-CoG benchmark. We use ResNet50 as a reference model. The remaining 30 models are divided into four categories.
Architecture: Models with different backbone architecture. The ones having similar (resp. dissimilar) number of parameters to ResNet50 are colored in red (resp. orange).
- T2T-ViT-t-14, visual transformer model
- DeiT-S, visual transformer model
- DeiT-S-distilled, distilled DeiT-S
- Inception-v3, CNN with inception modules
- NAT-M4, neural architecture search model
- EfficientNet-B1, neural architecture search model
- EfficientNet-B4, bigger EfficientNet-B1
- DeiT-B-distilled, bigger DeiT-S-distilled
- ResNet152, bigger ResNet50
- VGG-19, simple CNN architecture
Self-supervision: ResNet50 models trained in this framework.
- SimCLR-v2, online instance discrimination (ID)
- MoCo-v2, ID with momentum encoder and memory bank
- BYOL, negative-free ID with momentum encoder
- MoCHi, ID with negative pair mining
- InfoMin, ID with careful positive pair selection
- OBoW, online bag-of-visual-words prediction
- SwAV, online clustering
- DINO, online clustering
- BarlowTwins, feature de-correlation using positive pairs
- CompReSS, distilled model from SimCLR-v1 (with ResNet50x4)
Regularization: ResNet50 models with additional regularization
- MixUp, label-associated augmentation in input space
- Manifold-MixUp, label-associated augmentation in representation space
- CutMix, label-associated augmentation in input space
- ReLabel, model trained on a “multi-label” version of IN-1K
- Adv-Robust, adversarially robust model
- MEAL-v2, distilled ResNet50
Use of web data: ResNet50 models trained using additional data
- MoPro, trained on Webvision-V1
- Semi-Sup, semi-supervised model first pretrained on YFCC-100M, then fine-tuned on IN-1K
- Semi-Weakly-Sup, semi-weakly supervised model first pretrained on IG-1B, then fine-tuned on IN-1K
- CLIP, vision & language model trained on WebImageText.
Results
Benchmark files
These two files contain the concepts and data splits for ImageNet-CoG:
- cog_concepts_split_file.pkl: List of image filenames in the train and test splits for all 5000 ImageNet concepts in the CoG levels (~678MB).
- cog_levels_mapping_file.pkl: List of ImageNet concept names for each ImageNet-CoG level (~100KB).
If downloading doesn’t automatically start when linking the links above, please directly copy the links into your browser.