A PyTorch implementation of IRGen based on the paper IRGen: Generative Modeling for Image Retrieval.
conda install pytorch torchvision cudatoolkit=10.0 -c pytorch
pip install -r requirements.txt
CARS196, CUB200-2011 and In-shop Clothes are used in this repo.
You should download these datasets by yourself, and extract them into data/car
, data/cub
, data/isc
directory. For each dataset, run the following instruction to generate the ground truth file.
python gnd_generater.py
python train_tokenizer.py
optional arguments:
--data_path datasets path [default value is 'data']
--data_name dataset name [default value is 'isc'](choices=['car', 'cub', 'isc', 'imagenet', 'places'])
--feats initialize features for quantize [default value is '']
--output_dir saving output direction [default value is 'results']
--lr train learning rate [default value is 5e-4]
--batch_size train batch size [default value is 128]
--num_epochs train epoch number [default value is 200]
--rq_weight loss weight for rq reconsturcted features
python rq.py --features 'isc_features.npy' --file_name 'isc_rq.pkl' --data_dir 'data/isc'
optional arguments:
--data_name dataset name [default value is 'isc'](choices=['car', 'cub', 'isc', 'imagenet', 'places'])
python -m torch.distributed.launch --nproc_per_node=8 train_ar.py --file_name 'in-shop_clothes_retrieval_trainval.pkl' --codes 'isc_rq.pkl'
optional arguments:
--data_dir datasets path [default value is 'data/isc/Img']
--data_name dataset name [default value is 'isc'](choices=['car', 'cub', 'isc', 'imagenet', 'places'])
--output_dir saving output direction [default value is 'results']
--lr train learning rate [default value is 8e-5]
--batch_size train batch size [default value is 64]
--num_epochs train epoch number [default value is 200]
--smoothing smoothing value for label smoothing [default value is 0.1]
python test_ar.py --file_name 'in-shop_clothes_retrieval_trainval.pkl' --codes 'isc_rq.pkl' --model_dir 'results/isc_rq_e200.pkl'
optional arguments:
--data_dir datasets path [default value is 'data/isc/Img']
--data_name dataset name [default value is 'isc'](choices=['car', 'cub', 'isc', 'imagenet', 'places'])
--beam_size size for beam search [default value is 30]
--ks query number for test@k[default value is [1,10,20,30]]
The models are trained on 8 NVIDIA Tesla V100 (32G) GPU.
P refers to precision, R refers to recall.
P@1 | P@10 | P@20 | P@30 | R@1 | R@10 | R@20 | R@30 |
---|---|---|---|---|---|---|---|
92.4% | 87.4% | 87.0% | 86.9% | 92.4% | 96.8% | 97.6% | 97.9% |
P@1 | P@2 | P@4 | P@8 | R@1 | R@2 | R@4 | R@8 |
---|---|---|---|---|---|---|---|
82.7% | 82.7% | 83.0% | 82.8% | 82.7% | 86.4% | 89.2% | 91.4% |
P@1 | P@2 | P@4 | P@8 | R@1 | R@2 | R@4 | R@8 |
---|---|---|---|---|---|---|---|
90.1% | 89.9% | 90.2% | 90.5% | 90.1% | 92.1% | 93.2% | 93.7% |
The Precision-Recall curve.