Fake it till you make it:

Learning transferable representations from synthetic ImageNet clones

CVPR 2023

Mert Bulent Sariyildiz^1,2, Karteek Alahari², Diane Larlus¹, Yannis Kalantidis¹

¹ NAVER LABS Europe ² Inria

Recent image generation models such as Stable Diffusion have exhibited an impressive ability to generate fairly realistic images starting from a simple text prompt. Could such models render real images obsolete for training image prediction models? In this paper, we answer part of this provocative question by questioning the need for real images when training models for ImageNet classification. Provided only with the class names that have been used to build the dataset, we explore the ability of Stable Diffusion to generate synthetic clones of ImageNet and we measure how useful those are for training classification models from scratch. We show that with minimal and class-agnostic prompt engineering, ImageNet clones are able to close a large part of the gap between models produced by synthetic images and models trained with real images, for the several standard classification benchmarks that we consider in this study. More importantly, we show that models trained on synthetic images exhibit strong generalization properties and perform on par with models trained on real data.

Training models on synthetic images

Evaluating models on real images

Overview of our experimental protocol. During training, the model has access to synthetic images generated by the Stable Diffusion model, provided with a set of prompts per class. During evaluation, real images are classified by the frozen model

Performance of ImageNet-SD models

The blue polygon shows the performance of a model trained on ImageNet-1K. The red polygon depicts the performance of a model trained only on synthetic data, generated with Stable Diffusion using the class names of ImageNet-1K. We report top-5 accuracy for all ImageNet test sets, and average top-1 over three groups of transfer datasets.

Pretrained Models

We provide two ResNet50 models pretrained on our synthetic ImageNet clones: ImageNet-100-SD or ImageNet-1K-SD. In both cases, we generate images with Stable Diffusion 1.4 using guidance scale=2 and prompts which are composed of name and description of classes. For more details, please refer to our paper. These models are provided only for research purposes.

Dataset	Pretrained Models	ImageNet Val Top-1 Acc.	Avg. Transfer Top-1 Acc.
ImageNet-1K-SD	Download	42.9	68.4
ImageNet-100-SD	Download	73.3	63.2

You can load these pretrained models with the following code:

import torch as th
from torchvision.models import resnet50
ckpt = th.load("imagenet_1k_sd.pth", "cpu")
net = resnet50()
net.fc = th.nn.Linear(2048, 1000, bias=False)  # change 1000 to 100 for "imagenet_100_sd.pth"
msg = net.load_state_dict(ckpt, strict=True)

To evaluate these models on transfer datasets, you can use our transfer learning suite here.

Bibtex

If you find our paper or pretrained models useful for your research, please consider citing us.

@InProceedings{sariyildiz2023fake,
  title={Fake it till you make it: Learning transferable representations from synthetic ImageNet clones},
  author={Sariyildiz, Mert Bulent and Alahari, Karteek and Larlus, Diane and Kalantidis, Yannis},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2023}
}

Fake it till you make it:

Learning transferable representations from synthetic ImageNet clones

CVPR 2023

Training models on synthetic images

Evaluating models on real images

Performance of ImageNet-SD models

Pretrained Models

Bibtex

NAVER FRANCE Gender Equality 2024

All

Publications

Blog

News

Code & Data

Careers

People

ACTION

Providing embodied agents with sequential decision-making capabilities to safely execute complex tasks in dynamic environments.

INTERACTION

Equip robots to interact safely with humans, other robots and systems.

VISION

Perception to help robots understand and interact with the environment.

NAVER FRANCE Gender Equality 2023

Action

Fake it till you make it:

Learning transferable representations from synthetic ImageNet clones

CVPR 2023

Training models on synthetic images

Evaluating models on real images

Performance of ImageNet-SD models

Pretrained Models

Bibtex

All

Publications

Blog

News

Code & Data

Careers

People

Cookie settings