This project hosts the code for the implementation of Weakly Supervised Instance Segmentation using the Bounding Box Tightness Prior (NeurIPS 2019).
The main code is based on maskrcnn-benchmark (#5c44ca7) and the post-processing code is based on meanfield-matlab.
This paper presents a weakly supervised instance segmentation method that consumes training data with tight bounding box annotations. The major difficulty lies in the uncertain figure-ground separation within each bounding box since there is no supervisory signal about it. We address the difficulty by formulating the problem as a multiple instance learning (MIL) task, and generate positive and negative bags based on the sweeping lines of each bounding box. The proposed deep model integrates MIL into a fully supervised instance segmentation network, and can be derived by the objective consisting of two terms, i.e., the unary term and the pairwise term. The former estimates the foreground and background areas of each bounding box while the latter maintains the unity of the estimated object masks. The experimental results show that our method performs favorably against existing weakly supervised methods and even surpasses some fully supervised methods for instance segmentation on the PASCAL VOC dataset.
Check INSTALL.md for installation instructions.
All details of dataset construction can be found in Sec 4.1 of our paper.
Training
We construct the training set by two following settings:
-
PASCAL VOC (Augmented)
- Extent the training set of VOC 2012 with SBD training set.
- Result in an augmented set of 10582 training images. (COCO format download link)
-
PASCAL VOC (Augmented) + COCO
- Extent the training set of VOC (Augmented) with COCO dataset.
- Consider only the images that contain any of the 20 Pascal classes and only objects with a bounding box area larger than 200 pixels from COCO dataset.
- After the filtering, 99310 images remain from both the training and validation sets of COCO dataset.
Testing
We evaluate our method on PASCAL VOC 2012 validation set. (COCO format download link)
Note that the conversion of annotated format from VOC to COCO will result in inaccurate segment boundaries. See Evaluation for more details.
Format and Path
In our experiment, we convert the generated dataset into COCO format.
Before the training, please modified paths_catalog.py and enter the correct data path for voc_2012_aug_train_cocostyle
, voc_2012_val_cocostyle
, and voc_2012_coco_aug_train_cocostyle
.
Run the bash files directly:
Training on PASCAL VOC (Augmented) with 4 GPUs
bash train_voc_aug.sh
Training on PASCAL VOC (Augmented) + COCO with 4 GPUs
train_voc_coco_aug.sh
or type the bash commands:
Training on PASCAL VOC (Augmented) with 4 GPUs
python -m torch.distributed.launch --nproc_per_node=4 tools/train_net.py --config-file ./configs/BBTP/e2e_mask_rcnn_R_101_FPN_4x_voc_aug_cocostyle.yaml
Training on PASCAL VOC (Augmented) + COCO with 4 GPUs
python -m torch.distributed.launch --nproc_per_node=4 tools/train_net.py --config-file ./configs/BBTP/e2e_mask_rcnn_R_101_FPN_4x_voc_coco_aug_cocostyle.yaml
All bash commands are derived from maskrcnn-benchmark (#5c44ca7).
You may also want to see the original README.md of maskrcnn-benchmark.
Although COCO dataset has its own python API for evaluation, the conversion of annotated format from VOC to COCO will result in inaccurate segment boundaries.
To avoid such issue, we recommend to evaluate the predicted results via standard VOC API directly by the following steps:
1. Save the predictions
- Modify the key
TEST.SAVE_PRED_AS_MAT
asTrue
in config files (example). - Run
test_voc_aug.sh
ortest_voc_coco_aug.sh
, then the predictions will be saved as mask.mat in the directory. (the mat file is usually around 4~5 GB)
2. Evaluate
- Download VOCcode.
- Set up the paths in EvalBaseline.m (L3~L16).
- Run EvalBaseline.m in Matlab.
(Optional) 3. Post-processing (DenseCRF)
- Set up the paths in Run_VOCInst.m (L4~L22).
- Run Run_VOCInst.m in Matlab.
We provide the model weights and mask files of all experiments in this section.
Reported in the main paper
Dataset | mAP@0.25 | mAP@0.50 | mAP@0.70 | mAP@0.75 | Post-processing | Model |
---|---|---|---|---|---|---|
PASCAL VOC (Augmented) | 74.7 | 53.7 | 23.6 | 16.9 | w/o DenseCRF | link |
PASCAL VOC (Augmented) | 75.0 | 58.9 | 30.4 | 21.6 | w/ DenseCRF | - |
PASCAL VOC (Augmented) + COCO | 76.8 | 54.4 | 23.7 | 17.4 | w/o DenseCRF | link |
PASCAL VOC (Augmented) + COCO | 77.2 | 60.1 | 29.4 | 21.2 | w/ DenseCRF | - |
Reproduced with the released code
Dataset | mAP@0.25 | mAP@0.50 | mAP@0.70 | mAP@0.75 | Post-processing | Model |
---|---|---|---|---|---|---|
PASCAL VOC (Augmented) | 74.0 | 54.1 | 24.5 | 17.1 | w/o DenseCRF | link |
PASCAL VOC (Augmented) | 74.4 | 59.1 | 30.2 | 21.9 | w/ DenseCRF | - |
* training log can be found here.
Environments
-
Hardware
- 4 NVIDIA 1080 Ti GPUs
-
Software
- PyTorch version: 1.0.1
- CUDA 10.2
Please consider citing our paper in your publications if the project helps your research.
@inproceedings{hsu2019bbtp,
title = {Weakly Supervised Instance Segmentation using the Bounding Box Tightness Prior},
author = {Cheng-Chun Hsu, Kuang-Jui Hsu, Chung-Chi Tsai, Yen-Yu Lin, Yung-Yu Chuang},
booktitle = {Neural Information Processing Systems},
year = {2019}
}