Fake it till you make it:

Learning transferable representations from synthetic ImageNet clones

CVPR 2023

Mert Bulent Sariyildiz1,2, Karteek Alahari2Diane Larlus1, Yannis Kalantidis1

1 NAVER LABS Europe            2 Inria

Recent image generation models such as Stable Diffusion have exhibited an impressive ability to generate fairly realistic images starting from a simple text prompt. Could such models render real images obsolete for training image prediction models? In this paper, we answer part of this provocative question by questioning the need for real images when training models for ImageNet classification. Provided only with the class names that have been used to build the dataset, we explore the ability of Stable Diffusion to generate synthetic clones of ImageNet and we measure how useful those are for training classification models from scratch. We show that with minimal and class-agnostic prompt engineering, ImageNet clones are able to close a large part of the gap between models produced by synthetic images and models trained with real images, for the several standard classification benchmarks that we consider in this study. More importantly, we show that models trained on synthetic images exhibit strong generalization properties and perform on par with models trained on real data.

Training models on synthetic images


Evaluating models on real images


Overview of our experimental protocol. During training, the model has access to synthetic images generated by the Stable Diffusion model, provided with a set of prompts per class. During evaluation, real images are classified by the frozen model

Performance of ImageNet-SD models


The blue polygon shows the performance of a model trained on ImageNet-1K. The red polygon depicts the performance of a model trained only on synthetic data, generated with Stable Diffusion using the class names of ImageNet-1K. We report top-5 accuracy for all ImageNet test sets, and average top-1 over three groups of transfer datasets.

Pretrained Models

We provide two ResNet50 models pretrained on our synthetic ImageNet clones: ImageNet-100-SD or ImageNet-1K-SD. In both cases, we generate images with Stable Diffusion 1.4 using guidance scale=2 and prompts which are composed of name and description of classes. For more details, please refer to our paper. These models are provided only for research purposes.

Dataset Pretrained Models ImageNet Val
Top-1 Acc.
Avg. Transfer
Top-1 Acc.
ImageNet-1K-SD Download 42.9 68.4
ImageNet-100-SD Download 73.3 63.2

You can load these pretrained models with the following code:

import torch as th
from torchvision.models import resnet50
ckpt = th.load("imagenet_1k_sd.pth", "cpu")
net = resnet50()
net.fc = th.nn.Linear(2048, 1000, bias=False)  # change 1000 to 100 for "imagenet_100_sd.pth"
msg = net.load_state_dict(ckpt, strict=True)

To evaluate these models on transfer datasets, you can use our transfer learning suite here.


If you find our paper or pretrained models useful for your research, please consider citing us.

  title={Fake it till you make it: Learning transferable representations from synthetic ImageNet clones},
  author={Sariyildiz, Mert Bulent and Alahari, Karteek and Larlus, Diane and Kalantidis, Yannis},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},

This web site uses cookies for the site search, to display videos and for aggregate site analytics.

Learn more about these cookies in our privacy notice.


Cookie settings

You may choose which kind of cookies you allow when visiting this website. Click on "Save cookie settings" to apply your choice.

FunctionalThis website uses functional cookies which are required for the search function to work and to apply for jobs and internships.

AnalyticalOur website uses analytical cookies to make it possible to analyse our website and optimize its usability.

Social mediaOur website places social media cookies to show YouTube and Vimeo videos. Cookies placed by these sites may track your personal data.