Mosaic of Object-centric Images as Scene-centric Images (MosaicOS) for long-tailed object detection and instance segmentation.
Many objects do not appear frequently enough in complex scenes (e.g., certain handbags in living rooms) for training an accurate object detector, but are often found frequently by themselves (e.g., in product images). Yet, these object-centric images are not effectively leveraged for improving object detection in scene-centric images.
We propose Mosaic of Object-centric images as Scene-centric images (MosaicOS), a simple and novel framework that is surprisingly effective at tackling the challenges of long-tailed object detection. Keys to our approach are three-fold: (i) pseudo scene-centric image construction from object-centric images for mitigating domain differences, (ii) high-quality bounding box imputation using the object-centric images’ class labels, and (iii) a multistage training procedure. Check our paper for further details:
MosaicOS: A Simple and Effective Use of Object-Centric Images for Long-Tailed Object Detection. In IEEE/CVF International Conference on Computer Vision (ICCV), 2021.
by Cheng Zhang*, Tai-Yu Pan*, Yandong Li, Hexiang Hu, Dong Xuan, Soravit Changpinyo, Boqing Gong, Wei-Lun Chao.
The script mosaic.py
generates mosaic images and annotaions by given an annotation file in COCO format (for more information here). The following command will generate 2x2 mosaic images and the annotation file for COCO training dataset in OUTPUT_DIR/images/
and OUTPUT_DIR/annotation.json
with 4 processors. --shuffle
is to shuffle the order of images to synthesize and --drop-last
is to drop the last couple of images if they are not enough for nrow * ncol
. --demo 10
plots 10 synthesized images with annotated boxes in OUTPUT_DIR/demo/
for visualization.
python mosaic.py --coco-file datasets/coco/annotations/instances_train2017.json --img-dir datasets/coco --output-dir output_mosaics --num-proc 4 --nrow 2 --ncol 2 --shuffle --drop-last --demo 10
*Note: In our work, we sythesize mosaics from object-centric images with pseudo bounding box to find-tune the pre-trained detector.
Our impelementation is based on Detectron2. All models are trained on LVIS training set with Repeated Factor Sampling (RFS).
- Object detection
Backbone | Method | APb | APbr | APbc | APbf | Download |
---|---|---|---|---|---|---|
R50-FPN | Faster R-CNN | 23.4 | 13.0 | 22.6 | 28.4 | model |
R50-FPN | MosaicOS | 25.0 | 20.2 | 23.9 | 28.3 | model |
- Instance segmentation
Backbone | Method | AP | APr | APc | APf | APb | Download |
---|---|---|---|---|---|---|---|
R50-FPN | Mask R-CNN | 24.4 | 16.0 | 24.0 | 28.3 | 23.6 | model |
R50-FPN | MosaicOS | 26.3 | 19.7 | 26.6 | 28.5 | 25.8 | model |
- Object detection
Backbone | Method | APb | APbr | APbc | APbf | Download |
---|---|---|---|---|---|---|
R50-FPN | Faster R-CNN | 22.0 | 10.6 | 20.1 | 29.2 | model |
R50-FPN | MosaicOS | 23.9 | 15.5 | 22.4 | 29.3 | model |
- Instance segmentation
Backbone | Method | AP | APr | APc | APf | APb | Download |
---|---|---|---|---|---|---|---|
R50-FPN | Mask R-CNN | 22.6 | 12.3 | 21.3 | 28.6 | 23.3 | model |
R50-FPN | MosaicOS | 24.5 | 18.2 | 23.0 | 28.8 | 25.1 | model |
R101-FPN | Mask R-CNN | 24.8 | 15.2 | 23.7 | 30.3 | 25.5 | model |
R101-FPN | MosaicOS | 26.7 | 20.5 | 25.8 | 30.5 | 27.4 | model |
X101-FPN | Mask R-CNN | 26.7 | 17.6 | 25.6 | 31.9 | 27.4 | model |
X101-FPN | MosaicOS | 28.3 | 21.8 | 27.2 | 32.4 | 28.9 | model |
Please cite with the following bibtex if you find it useful.
@inproceedings{zhang2021mosaicos,
title={{MosaicOS}: A Simple and Effective Use of Object-Centric Images for Long-Tailed Object Detection},
author={Zhang, Cheng and Pan, Tai-Yu and Li, Yandong and Hu, Hexiang and Xuan, Dong and Changpinyo, Soravit and Gong, Boqing and Chao, Wei-Lun},
booktitle = {ICCV},
year={2021}
}
Feel free to email us if you have any questions.
Cheng Zhang (zhang.7804@osu.edu), Tai-Yu Pan (pan.667@osu.edu), Wei-Lun Harry Chao (chao.209@osu.edu)