Fake it till you make it:
Learning transferable representations from synthetic ImageNet clones
CVPR 2023
Mert Bulent Sariyildiz1,2, Karteek Alahari2, Diane Larlus1, Yannis Kalantidis1
1 NAVER LABS Europe 2 Inria
Recent image generation models such as Stable Diffusion have exhibited an impressive ability to generate fairly realistic images starting from a simple text prompt. Could such models render real images obsolete for training image prediction models? In this paper, we answer part of this provocative question by questioning the need for real images when training models for ImageNet classification. Provided only with the class names that have been used to build the dataset, we explore the ability of Stable Diffusion to generate synthetic clones of ImageNet and we measure how useful those are for training classification models from scratch. We show that with minimal and class-agnostic prompt engineering, ImageNet clones are able to close a large part of the gap between models produced by synthetic images and models trained with real images, for the several standard classification benchmarks that we consider in this study. More importantly, we show that models trained on synthetic images exhibit strong generalization properties and perform on par with models trained on real data.
Training models on synthetic images
Evaluating models on real images
Overview of our experimental protocol. During training, the model has access to synthetic images generated by the Stable Diffusion model, provided with a set of prompts per class. During evaluation, real images are classified by the frozen model
Performance of ImageNet-SD models
The blue polygon shows the performance of a model trained on ImageNet-1K. The red polygon depicts the performance of a model trained only on synthetic data, generated with Stable Diffusion using the class names of ImageNet-1K. We report top-5 accuracy for all ImageNet test sets, and average top-1 over three groups of transfer datasets.
Pretrained Models
We provide two ResNet50 models pretrained on our synthetic ImageNet clones: ImageNet-100-SD or ImageNet-1K-SD. In both cases, we generate images with Stable Diffusion 1.4 using guidance scale=2 and prompts which are composed of name and description of classes. For more details, please refer to our paper. These models are provided only for research purposes.
Dataset | Pretrained Models | ImageNet Val Top-1 Acc. | Avg. Transfer Top-1 Acc. |
---|---|---|---|
ImageNet-1K-SD | Download | 42.9 | 68.4 |
ImageNet-100-SD | Download | 73.3 | 63.2 |
You can load these pretrained models with the following code:
import torch as th
from torchvision.models import resnet50
ckpt = th.load("imagenet_1k_sd.pth", "cpu")
net = resnet50()
net.fc = th.nn.Linear(2048, 1000, bias=False) # change 1000 to 100 for "imagenet_100_sd.pth"
msg = net.load_state_dict(ckpt, strict=True)
To evaluate these models on transfer datasets, you can use our transfer learning suite here.
Bibtex
If you find our paper or pretrained models useful for your research, please consider citing us.
@InProceedings{sariyildiz2023fake, title={Fake it till you make it: Learning transferable representations from synthetic ImageNet clones}, author={Sariyildiz, Mert Bulent and Alahari, Karteek and Larlus, Diane and Kalantidis, Yannis}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year={2023} }