Authors: Tong Shen, Guosheng Lin, Lingqiao Liu, Chunhua Shen, Ian Reid
Training a Fully Convolutional Network (FCN) for semantic segmentation requires a large number of masks with pixel level labelling, which involves a large amount of human labour and time for annotation. In contrast, web images and their image-level labels are much easier and cheaper to obtain. In this work, we propose a novel method for weakly supervised semantic segmentation with only image-level labels. The method utilizes the internet to retrieve a large number of images and uses a large scale co-segmentation framework to generate masks for the retrieved images. We first retrieve images from search engines, e.g. Flickr and Google, using semantic class names as queries, e.g. class names in the dataset PASCAL VOC 2012. We then use high quality masks produced by co-segmentation on the retrieved images as well as the target dataset images with image level labels to train segmentation networks. We obtain an IoU score of 56.9 on test set of PASCAL VOC 2012, which reaches the state-of-the-art performance.
Please consider citing us if you find it useful:
@inproceedings{Shen:2017:wss,
author = {Tong Shen and
Guosheng Lin and
Lingqiao Liu and
Chunhua Shen and
Ian Reid},
title = {Weakly Supervised Semantic Segmentation Based on Web Image Co-segmentation},
booktitle = {BMVC},
year = {2017}
}
The code is implemented in MXNet. Please go to the official website (HERE) for installation. Please make sure the MXNet is compiled with OpenCV support.
The other python dependences can be found in "dependencies.txt" and can be installed:
pip install -r dependencies.txt
The Web data can be downloaded here. Since the co-segmentation code is not included (Original Github), one can either run the code to get the masks or use the masks provided, which are already processed. To use the provided masks, extract the files and put all the images and masks in "dataset/web_images" and "dataset/web_labels" respectively. No subfolders should be used.
For PASCAL VOC data, please download PASCAL VOC12 (HERE) and SBD (HERE). Then extract the files into folder "dataset" and run:
python create_dataset.py
First download the Resnet50 model pretrained on ImageNet (HERE). Put it in folder "models".
To train the "Initial Mask Generator", simply run:
python train_seg_model.py --model init --gpus 0,1,2,3
To evaluate a certain snapshot (for example epoch X), run:
python eval_seg_model.py --model init --gpu 0 --epoch X
To evaluate all the snapshots, run:
python eval_loop.py --model init --gpu 0
The evaluated snapshots will have a corresponding folder in "outputs". This eval_loop.py
will check if there is
any unevaluated snapshots and evaluate them.
To further improve the score, finetune a snapshot (for example epoch X) with smaller learning rate:
python train_seg_model.py --model init --gpus 0,1,2,3 --epoch X --lr 16e-5
Check the evaluation log in "log/eval_model.log" and find the best snapshot (Download a trained one HERE) for the mask generator. For example the best epoch is "X", then run:
python est_voc_train_masks.py --gpu 0 --epoch X
python train_seg_model.py --model final --gpus 0,1,2,3
python eval_loop.py --model final --gpu 0
The above code will estimate the masks for the VOC training images and train the final model.
The snapshots will be saved in folder "snapshots". To evaluate a snapshot, simply use (for example epoch X):
python eval_seg_model.py --model final --gpu 0 --epoch X
There are other flags:
--ms use multi-scale for inference
--savemask save output masks
--crf use CRF as postprocessing
There is a trained model that can be downloaded HERE.
Download the model and put it in folder "snapshots". Run:
python eval_seg_model.py --model final --gpu 0 --epoch 23 --crf --ms
It will get IoU of 56.4, as reported in the paper.
A demo code is given in "demo". Download the final model (HERE) and put it in the folder "snapshots". Please use Jupyter to run "Demo.ipynb".
There are some examples here.