Pic2Word (CVPR2023)

This is an open source implementation of Pic2Word. This is not an officially supported Google product.

Data

Training Data

We utilize Conceptual Captions URLs to train a model. See open_clip to see the process of getting the dataset.

The training data directory has to be in the root of this repo, and should be structured like below.

  cc_data
    ├── train ## training image diretories.
    └── val ## validation image directories.
  cc
    ├── Train_GCC-training_output.csv ## training data list
    └── Validation_GCC-1.1.0-Validation_output.csv ## validation data list

Test Data

See README to prepare test dataset.

Training

Install dependencies

See open_clip for the details of installation. The same environment should be usable in this repo. setenv.sh is the script we used to set-up the environment in virtualenv.

Also run below to add directory to pythonpath:

. env3/bin/activate
export PYTHONPATH="$PYTHONPATH:$PWD/src"
export PYTHONWARNINGS='ignore:semaphore_tracker:UserWarning'

Pre-trained model

The model is available in GoogleDrive.

Sample running code for training:

python -u src/main.py \
    --save-frequency 1 \
    --train-data="cc/Train_GCC-training_output.csv"  \
    --warmup 10000 \
    --batch-size=128 \
    --lr=1e-4 \
    --wd=0.1 \
    --epochs=30 \
    --workers=8 \
    --openai-pretrained \
    --model ViT-L/14

Sample evaluation only:

Evaluation on COCO, ImageNet, or CIRR.

python src/eval_retrieval.py \
    --openai-pretrained \
    --resume /path/to/checkpoints \
    --eval-mode $data_name \ ## replace with coco, imgnet, or cirr
    --gpu $gpu_id
    --model ViT-L/14

Evaluation on fashion-iq (shirt or dress or toptee)

python src/eval_retrieval.py \
    --openai-pretrained \
    --resume /path/to/checkpoints \
    --eval-mode fashion \
    --source $cloth_type \ ## replace with shirt or dress or toptee
    --gpu $gpu_id
    --model ViT-L/14

Demo:

Evaluation on COCO, ImageNet, or CIRR.

python src/demo.py \
    --openai-pretrained \
    --resume /path/to/checkpoints \
    --retrieval-data $data_name \ ## Choose from coco, imgnet, cirr, dress, shirt, toptee.
    --query_file "path_img1,path_img2,path_img3..." \ ## query images
    --prompts "prompt1,prompt2,..." \ #prompts. Use * to indicate the token to be replaced with an image token. e.g., "a sketch of *"
    --demo-out $path_demo \ # directory to generate html file and image directory.
    --gpu $gpu_id
    --model ViT-L/14

This demo will generate a directory which includes html file and an image directory. Download the directory and open html to see results.

Citing

If you found this repository useful, please consider citing:

@article{saito2023pic2word,
  title={Pic2Word: Mapping Pictures to Words for Zero-shot Composed Image Retrieval},
  author={Saito, Kuniaki and Sohn, Kihyuk and Zhang, Xiang and Li, Chun-Liang and Lee, Chen-Yu and Saenko, Kate and Pfister, Tomas},
  journal={CVPR},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
model		model
src		src
third_party/open_clip		third_party/open_clip
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
env.sh		env.sh
requirements.txt		requirements.txt
setenv.sh		setenv.sh
valprep.sh		valprep.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pic2Word (CVPR2023)

Data

Training Data

Test Data

Training

Install dependencies

Pre-trained model

Sample running code for training:

Sample evaluation only:

Demo:

Citing

About

Releases

Packages

Contributors 3

Languages

License

google-research/composed_image_retrieval

Folders and files

Latest commit

History

Repository files navigation

Pic2Word (CVPR2023)

Data

Training Data

Test Data

Training

Install dependencies

Pre-trained model

Sample running code for training:

Sample evaluation only:

Demo:

Citing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages