Skip to content

Code repository for the paper "The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better"

License

Notifications You must be signed in to change notification settings

scottgeng00/unmet-promise

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better

This Github repo is the official codebase used to conduct all experiments in the Unmet Promise paper. If you have any questions, please start a Github issue or contact sgeng@cs.washington.edu.

Quickstart

To get started, clone and setup the environment from the provided environment.yml:

git clone https://github.com/scottgeng00/unmet-promise
cd unmet-promise
conda env create -f environment.yml

Code structure

  • The adapt/ folder contains all code for finetuning CLIP models on a downstream vision task given a targeted adaptation dataset.

  • The data_sourcing/ folder contains three submodules for sourcing task-targeted image data.

    • ../synthetic/ contains code for generating targeted synthetic images from a generative text-to-image model $G$ trained on an upstream image-text dataset $D$.
    • ../synthetic/ contains code for subselecting targeted data directly from general image-text pairs; we use this code to retrieve targeted data directly from the generative model's pretraining data $D$.
    • ../filtering/ contains code for filtering and test-set dedupping all targeted data.

Each subfolder contains its own README.md file detailing usage and setup.

Coming soon

  • Release of generated synthetic data
  • Release of LAION-2B kNN indicies

About

Code repository for the paper "The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages