General

This is the official code for YORO - Lightweight End to End Visual Grounding, accepted by European Conference On Computer Vision (ECCV) Workshop on International Challenge on Compositional and Multimodal Perception, Tel-Aviv, Israel, 2022

Evironment

Use environment/environment.yml or environment/environment_cuda102.yml depending on cuda version for creating the environment.

conda env create -f environment/environment.yml 
conda activate yoro
python -m spacy download en_core_web_sm

Datasets

Comment out the dataset in download.sh that is not needed. It takes few hours to download all datasets
Download the dataset to the "./dataset/raw" folder by

sh download.sh

Preprocess Dataset

Converting the raw data to arrow format.

Comment out the dataset in preprocess_dataset.py that is not needed

python preprocess_dataset.py

The preprocessed dataset will be stored in "./dataset/arrow"

Download Pretrained weight

Vilt weight for pretraining

cd pretrained_weight
sh download_weight.sh

Yoro weight for various VG tasks

Download result.zip from google drive google drive
unzip the result.zip

Evaluation

For each eval.sh file in the script/DATASET, change the flag "debug" to False to run full evaluation. Below, we will describe how to run the eval.sh for different datasets.

Pretraining tasks

sh script/pretrain/eval.sh

Downstream tasks

RefCoco Dataset

sh script/RefCoco/eval.sh

RefCoco+ Dataset

sh script/RefCocoP/eval.sh

RefCocog Dataset

sh script/RefCocog/eval.sh

CopsRef Dataset

sh script/copsref/eval.sh

ReferItGame/RefClef Dataset

sh script/ReferItGame/eval.sh

Training

For all run.sh file, please change the "debug" flag to True to run the full training.

Pretraining tasks

For Modulated detection pretraining, we start from a mlm-itm pretrained model, such as the vilt pretraining checkpoint. For example, the below script is for training with 5 det tokens for 40 epochs on 1 gpu. Please refer to the comment in the script for more details.

sh script/pretrain/run.sh 5 40 1

Downstream tasks

RefCoco Dataset

For RefCoco dataset, we load the pretraining checkpoints as initial weight. For example, the below script is for training with 5 det tokens for 10 epochs on 1 gpu. Please refer to the comment in the script for more details.

sh script/RefCoco/run.sh 5 10 1

RefCoco+ Dataset

For RefCoco+ dataset, we load the pretraining checkpoints as initial weight. For example, the below script is for training with 5 det tokens for 10 epochs on 1 gpu. Please refer to the comment in the script for more details.

sh script/RefCocoP/run.sh 5 10 1

RefCocog Dataset

For RefCocog dataset, we load the pretraining checkpoints as initial weight. For example, the below script is for training with 5 det tokens for 10 epochs on 1 gpu. Please refer to the comment in the script for more details.

sh script/RefCocog/run.sh 5 10 1

CopsRef Dataset

For copsref dataset, we load the pretraining checkpoints as initial weight. For example, the below script is for training with 5 det tokens for 40 epochs on 1 gpu. Please refer to the comment in the script for more details.

sh script/copsref/run.sh 5 40 1

ReferItGame/RefClef Dataset

For ReferItGame/RefClef dataset, we load the pretraining checkpoints as initial weight. For example, the below script is for training with 5 det tokens for 40 epochs on 1 gpu. Please refer to the comment in the script for more details.

sh script/ReferItGame/run.sh 5 40 1

Citation

If you find this method useful in your research, please cite this article:

@inproceedings{ho2022yoro,
  title={YORO-Lightweight End to End Visual Grounding},
  author={Ho, Chih-Hui and Appalaraju, Srikar and Jasani, Bhavan and Manmatha, R and Vasconcelos, Nuno},
  booktitle={ECCV 2022 Workshop on International Challenge on Compositional and Multimodal Perception},
  year={2022}
}

Acknowledgement

Please email to Chih-Hui (John) Ho (chh279@eng.ucsd.edu) if further issues are encountered. We heavily used the code from

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

General

Evironment

Datasets

Preprocess Dataset

Download Pretrained weight

Vilt weight for pretraining

Yoro weight for various VG tasks

Evaluation

Pretraining tasks

Downstream tasks

RefCoco Dataset

RefCoco+ Dataset

RefCocog Dataset

CopsRef Dataset

ReferItGame/RefClef Dataset

Training

Pretraining tasks

Downstream tasks

RefCoco Dataset

RefCoco+ Dataset

RefCocog Dataset

CopsRef Dataset

ReferItGame/RefClef Dataset

Citation

Acknowledgement

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
datasets		datasets
environment		environment
pretrained_weight		pretrained_weight
result		result
script		script
vilt		vilt
README.md		README.md
download.sh		download.sh
preprocess_dataset.py		preprocess_dataset.py
run.py		run.py

chihhuiho/yoro

Folders and files

Latest commit

History

Repository files navigation

General

Evironment

Datasets

Preprocess Dataset

Download Pretrained weight

Vilt weight for pretraining

Yoro weight for various VG tasks

Evaluation

Pretraining tasks

Downstream tasks

RefCoco Dataset

RefCoco+ Dataset

RefCocog Dataset

CopsRef Dataset

ReferItGame/RefClef Dataset

Training

Pretraining tasks

Downstream tasks

RefCoco Dataset

RefCoco+ Dataset

RefCocog Dataset

CopsRef Dataset

ReferItGame/RefClef Dataset

Citation

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages