[toc]
This repository provides the code for our paper at AAAI 2021:
Weakly Supervised Deep Hyperspherical Quantization for Image Retrieval. Jinpeng Wang, Bin Chen, Qiang Zhang, Zaiqiao Meng, Shangsong Liang, Shu-Tao Xia. [link].
We proposed WSDHQ, a weakly supervised deep quantization approach for image retrieval. Instead of requiring ground-truth labels, WSDHQ leverages the informal tags provided by amateur users to guide quantization learning, which can alleviate the reliance on manual annotations and facilitate the feasibility of industrial deployment. In WSDHQ, we propose a tag processing mechanism based on correlation to enhance the weak semantics of such noisy tags. Besides, we learn quantized representations on the hypersphere manifold, on which we design a novel adaptive cosine margin loss for embedding learning and a supervised cosine quantization loss for quantization. Experiments on Flickr-25K and NUS-WIDE datasets demonstrate the superiority of WSDHQ.
In the following, we will guide you how to use this repository step by step. 🤗
git clone https://github.com/gimpong/AAAI21-WSDHQ.git
cd AAAI21-WSDHQ/
tar -xvzf data.tar.gz
rm -f data.tar.gz
- python 3.7.8
- numpy 1.19.1
- scikit-learn 0.23.1
- h5py 2.10.0
- python-opencv 3.4.2
- tqdm 4.51.0
- tensorflow 1.15.0
Before running the code, we need to make sure that everything needed is ready. First, the working directory is expected to be organized as below:
AAAI21-WSDHQ/
- data/
- flickr25k/
- tags
- FinalTagEmbs.txt
- TagIdMergeMap.pkl
- common_tags.txt
- database_img.txt
- database_label.txt
- train_img.txt
- train_tag.txt
- test_img.txt
- test_label.txt
- nus-wide/
- tags
- FinalTagEmbs.txt
- TagIdMergeMap.pkl
- TagList1k.txt
- database_img.txt
- database_label.txt
- train_img.txt
- train_tag.txt
- test_img.txt
- test_label.txt
- datasets/
- GoogleNews-vectors-negative300.bin.gz
- flickr25k/
- mirflickr/
- im1.jpg
- im2.jpg
- ...
- nus-wide/
- Flickr/
- actor/
- 0001_2124494179.jpg
- 0002_174174086.jpg
- ...
- administrative_assistant/
- ...
- ...
- scripts/
- run0001.sh
- run0002.sh
- ...
- tag_processing.sh
- train.py
- validation.py
- net.py
- net_val.py
- util.py
- dataset.py
- alexnet.npy
-
The
data/
folder is the collection of data splits for Flickr25K and NUS-WIDE datasets. The raw images of Flickr25K and NUS-WIDE datasets should be downloaded additionally and arranged indatasets/flickr25k/
anddatasets/nus-wide/
respectively. Here we provide copies of these image datasets, you can download them via Google Drive or Baidu Wangpan (Web Drive, password: ocmv). -
The pre-trained files of AlexNet (
alexnet.npy
) and Word2Vec (GoogleNews-vectors-negative300.bin.gz
) can be downloaded from Baidu Wangpan (Web Drive, password: ocmv).
We have provided enhanced tag embeddings in this repository. See data/flickr25k/tags/
and data/nus-wide/tags/
.
If you want to reproduce these files, you can remove them and execute
cd scripts/
# '0' is the id of GPU
bash tag_processing.sh 0
To facilitate reproducibility, we provide the scripts with configurations for each experiment. The scripts can be found under the scripts/ folder. For example, if you want to train and evaluate an 8-bit WSDHQ model on Flickr25K dataset, you can do
cd scripts/
# '0' is the id of GPU
bash run0001.sh 0
The script run0001.sh
includes the running commands:
#!/bin/bash
cd ..
##8 bits
# dataset lr iter lambda subspace_num loss notes gpu
python train.py flickr 0.0003 800 0.0001 1 WSDQH 0001 $1
# dataset model_weight gpu
python validation.py flickr ./checkpoints/flickr_WSDQH_nbits=8_adaMargin_gamma=1_lambda=0.0001_0001.npy $1
cd -
After running a script, a series of files will be saved under logs/
and checkpoints/
. Take run0001.sh
as an example:
AAAI21-WSDHQ/
- logs/
- flickr_WSDQH_nbits=8_adaMargin_gamma=1_lambda=0.0001_0001.log
- checkpoints/
- flickr_WSDQH_nbits=8_adaMargin_gamma=1_lambda=0.0001_0001.npy
- flickr_WSDQH_nbits=8_adaMargin_gamma=1_lambda=0.0001_0001_retrieval.h5
- ...
Here we report the results of running the scripts on a GTX 1080 Ti. Results are shown in the following table. We have also uploaded the logs and checkpoint information for reference, which can be downloaded from Baidu Wangpan (Web Drive, password: ocmv).
Note that some values can slightly deviate from the reported results in our original paper. The phenomenon is caused by the randomness of Tensorflow and the software and hardware discrepancies.
Script | Dataset | Code Length / bits | MAP | Log |
---|---|---|---|---|
run0001.sh | Flickr25K | 8 | 0.766 | flickr_WSDQH_nbits=8_adaMargin_gamma=1_lambda=0.0001_0001.log |
run0002.sh | 16 | 0.755 | flickr_WSDQH_nbits=16_adaMargin_gamma=1_lambda=0.0001_0002.log | |
run0003.sh | 24 | 0.765 | flickr_WSDQH_nbits=24_adaMargin_gamma=1_lambda=0.0001_0003.log | |
run0004.sh | 32 | 0.767 | flickr_WSDQH_nbits=32_adaMargin_gamma=1_lambda=0.0001_0004.log | |
run0005.sh | NUS-WIDE | 8 | 0.717 | nuswide_WSDQH_nbits=8_adaMargin_gamma=1_lambda=0.0001_0005.log |
run0006.sh | 16 | 0.727 | nuswide_WSDQH_nbits=16_adaMargin_gamma=1_lambda=0.0001_0006.log | |
run0007.sh | 24 | 0.730 | nuswide_WSDQH_nbits=24_adaMargin_gamma=1_lambda=0.0001_0007.log | |
run0008.sh | 32 | 0.729 | nuswide_WSDQH_nbits=32_adaMargin_gamma=1_lambda=0.0001_0008.log |
If you find this code useful or use the toolkit in your work, please consider citing:
@inproceedings{wang2021wsdhq,
title={Weakly Supervised Deep Hyperspherical Quantization for Image Retrieval},
author={Wang, Jinpeng and Chen, Bin and Zhang, Qiang and Meng, Zaiqiao and Liang, Shangsong and Xia, Shutao},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={35},
number={4},
pages={2755--2763},
year={2021}
}
We use DeepHash as the code base in our implementation.
If you have any question, you can raise an issue or email Jinpeng Wang (wjp20@mails.tsinghua.edu.cn). We will reply you soon.