Improving the Generalization of Supervised Models

Mert Bulent Sariyildiz1,2Yannis Kalantidis1, Karteek Alahari2Diane Larlus1

1 NAVER LABS Europe            2 Inria

ImageNet-1K accuracy versus transfer performance of the models we evaluated in our paper.

ImageNet-1K (IN1K) vs transfer task performance for ResNet50. We report IN1K (Top-1 accuracy) and transfer performance (log odds) averaged over 13 datasets (the 5 ImageNet-CoG concept generalization datasets, Aircraft, Cars196, DTD, EuroSAT, Flowers, Pets, Food101 and SUN397) for a large number of our models trained with the supervised training setup we propose. Models on the convex hull are denoted by stars. We compare to public state-of-the-art (SotA) models: the supervised RSB-A1 and SupCon models, the self- supervised DINO, the semi-supervised PAWS and a variant of LOOK using multi-crop.

We consider the problem of training a deep neural network on a given classification task, e.g., ImageNet-1K (IN1K), so that it excels at that task as well as at other (future) transfer tasks. These two seemingly contradictory properties impose a trade-off between improving the model’s generalization while maintaining its performance on the original task. Models trained with self-supervised learning (SSL) tend to generalize better than their supervised counterparts for transfer learning; yet, they still lag behind supervised models on IN1K. In this paper, we propose a supervised learning setup that leverages the best of both worlds. We enrich the common supervised training framework using two key components of recent SSL models: multi-scale crops for data augmentation and the use of an expendable projector head. We replace the last layer of class weights with class prototypes computed on the fly using a memory bank. We show that these three improvements lead to a more favorable trade-off between the IN1K training task and 13 transfer tasks. Over all the explored configurations, we single out two models: t-ReX that achieves a new state of the art for transfer learning and outperforms top methods such as DINO and PAWS on IN1K, and t-ReX* that matches the highly optimized RSB-A1 model on IN1K while performing better on transfer tasks.

Pretrained t-ReX Models

See the table for the links to the pretrained t-ReX and t-ReX* models with a ResNet50 encoder trained on ImageNet-1K for 100 epochs.
More models and code will be released in the future. If you use our models, please cite our paper.

Model ImageNet-1K Accuracy
Mean Transfer Accuracy
(log odds)
t-ReX 78.0 1.357 Checkpoint
t-ReX* 80.2 1.078 Checkpoint

You can use the following simple code snippet for loading checkpoints:

import torch as th
from torchvision.models import resnet50
ckpt = th.load("trex.pth", "cpu")
net = resnet50(pretrained=False)
msg = net.load_state_dict(ckpt, strict=False)
assert msg.missing_keys == ["fc.weight", "fc.bias"] and msg.unexpected_keys == []


    title={Improving the Generalization of Supervised Models},
    author={Sariyildiz, Mert Bulent and Kalantidis, Yannis and Alahari, Karteek and Larlus, Diane},

This web site uses cookies for the site search, to display videos and for aggregate site analytics.

Learn more about these cookies in our privacy notice.


Cookie settings

You may choose which kind of cookies you allow when visiting this website. Click on "Save cookie settings" to apply your choice.

FunctionalThis website uses functional cookies which are required for the search function to work and to apply for jobs and internships.

AnalyticalOur website uses analytical cookies to make it possible to analyse our website and optimize its usability.

Social mediaOur website places social media cookies to show YouTube and Vimeo videos. Cookies placed by these sites may track your personal data.