🏠[Project page] 📄[GRES Arxiv] 📄[GREC Arxiv]
This repository contains information and tools for the gRefCOCO dataset, proposed by the CVPR2023 Highlight paper:
GRES: Generalized Referring Expression Segmentation
Chang Liu, Henghui Ding, Xudong Jiang
CVPR 2023 Highlight, Acceptance Rate 2.5%
⬇️ Get the gRefCOCO dataset from:
- ☁️ OneDrive
- Like RefCOCO, gRefCOCO also should be used together with images from the
train2014
of MS COCO. - An example of dataloader grefer.py is provided.
- We will update this repository with full API package and documentation soon. Please follow the usage in the baseline code for now.
-
The GREC evaluation metric code is here.
-
We provide code based on MDETR, its training and inference are as follows:
- Process grefcoco to coco format.
python scripts/fine-tuning/grefexp_coco_format.py --data_path xxx --out_path mdetr_annotations/ --coco_path xxx
- Training and download
pretrained_resnet101_checkpoint.pth
from MDETR
python -m torch.distributed.launch --nproc_per_node=2 --use_env main.py --dataset_config configs/grefcoco.json --batch_size 4 --load pretrained_resnet101_checkpoint.pth --ema --text_encoder_lr 1e-5 --lr 5e-5 --output-dir grefcoco
- Obtain
checkpoint.pth
after training or download trained model here ☁️ Google Drive - For test results, pass --test and --test_type test or testA or testB according to the dataset.
python -m torch.distributed.launch --nproc_per_node=2 --use_env main.py --dataset_config configs/grefcoco.json --batch_size 4 --resume grefcoco/checkpoint.pth --ema --eval
Please refer to ReLA for more details.
Our project is built upon refer and cocoapi. Many thanks to the authors for their great works!
Please consider to cite GRES/GREC if it helps your research.
@inproceedings{GRES,
title={{GRES}: Generalized Referring Expression Segmentation},
author={Liu, Chang and Ding, Henghui and Jiang, Xudong},
booktitle={CVPR},
year={2023}
}
@article{GREC,
title={{GREC}: Generalized Referring Expression Comprehension},
author={He, Shuting and Ding, Henghui and Liu, Chang and Jiang, Xudong},
journal={arXiv preprint arXiv:2308.16182},
year={2023}
}
We also recommend other highly related works:
@article{VLT,
title={{VLT}: Vision-language transformer and query generation for referring segmentation},
author={Ding, Henghui and Liu, Chang and Wang, Suchen and Jiang, Xudong},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
year={2023},
volume={45},
number={6},
publisher={IEEE}
}
@inproceedings{MeViS,
title={{MeViS}: A Large-scale Benchmark for Video Segmentation with Motion Expressions},
author={Ding, Henghui and Liu, Chang and He, Shuting and Jiang, Xudong and Loy, Chen Change},
booktitle={ICCV},
year={2023}
}