This repository is an official PyTorch implementation of the ECCV 2024 paper SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding. Our SegVG transfers the box-level annotation as Segmentation signals to provide an additional pixel-level supervision for Visual Grounding. Additionally, the query, text, and vision tokens are triangularly updated to mitigate domain discrepancy by our proposed Triple Alignment module. Please cite our paper if the paper or codebase is helpful to you.
@article{kang2024segvg,
title={Segvg: Transferring object bounding box to segmentation for visual grounding},
author={Kang, Weitai and Liu, Gaowen and Shah, Mubarak and Yan, Yan},
journal={arXiv preprint arXiv:2407.03200},
year={2024}}
-
Clone this repository.
git clone https://github.com/WeitaiKang/SegVG.git
-
Prepare for environment.
Please refer to
ReSC
for setting up environment. We use the 1.12.1+cu116 version pytorch. -
Prepare for data.
Please download the coco train2014
images
.Please download the referring expression annotations from the 'annotation' directory of
SegVG
.Please download the
ResNet101
ckpts of vision backbone from TransVG.You can place them wherever you want. Just remember to set the paths right in your train.sh and test.sh.
Our model ckpts are available in the 'ckpt' directory of SegVG
.
- RefCOCO
Model | val | testA | testB |
---|---|---|---|
SegVG | 86.84 | 89.46 | 83.07 |
- RefCOCO+
Model | val | testA | testB |
---|---|---|---|
SegVG | 77.18 | 82.63 | 67.59 |
- RefCOCOg
Model | val-g | val-u | test-u |
---|---|---|---|
SegVG | 76.01 | 78.35 | 77.42 |
- ReferItGame
Model | test |
---|---|
SegVG | 75.59 |
-
Training
bash train.sh
Please take a look of
train.sh
to set the parameters. -
Evaluation
bash test.sh
Please take a look of
test.sh
to set the parameters.
This codebase is partially based on TransVG
.