• [arXiv] •
We present MosaicFusion, a general diffusion-based data augmentation pipeline for large-vocabulary instance segmentation. The MosaicFusion-synthesized instance segmentation dataset can be used to train various downstream detection and segmentation models to improve their performances, especially for rare and novel categories.
🤩 Key Properties
|
MosaicFusion is a training-free diffusion-based dataset augmentation pipeline that can produce image and mask pairs with multiple objects simultaneously using the off-the-shelf text-to-image diffusion models. The overall pipeline of MosaicFusion consists of two components: image generation and mask generation.
Given only interest category names, MosaicFusion can generate high-quality multi-object images and masks simultaneously by conditioning on a specific text prompt for each region.
- Clone our repo from GitHub:
git clone https://github.com/Jiahao000/MosaicFusion.git
cd MosaicFusion
- Create the
conda
environment:
conda env create -f environment.yml
- Download lvis_v1_train.json, unzip and put it under a directory, e.g.,
data/lvis/meta/lvis_v1_train.json
.
- Generate images and masks with MosaicFusion:
bash scripts/dist_text2seg.sh "a photo of a single category" output/text2seg Generation_log
Alternatively, if you run MosaicFusion
on a cluster managed with slurm:
bash scripts/slurm_text2seg.sh Dummy Generation_job "a photo of a single category" output/text2seg Generation_log
- Convert generated images and masks to the required data format:
bash scripts/run_seg2ann.sh output/text2seg output/seg2ann
- Merge MosaicFusion annotations into LVIS annotations:
bash scripts/run_merge_ann.sh data/lvis/meta/lvis_v1_train.json output/seg2ann/annotations/lvis_v1_train_mosaicfusion.json output/seg2ann/annotations/lvis_v1_train_merged.json
Please refer to TRAIN.md for training details.
- Data generation code for MosaicFusion
- Third-party training code with MosaicFusion data
If you find this work useful for your research, please consider citing our paper:
@article{xie2024mosaicfusion,
author = {Xie, Jiahao and Li, Wei and Li, Xiangtai and Liu, Ziwei and Ong, Yew Soon and Loy, Chen Change},
title = {MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation},
journal = {International Journal of Computer Vision},
year = {2024}
}
Distributed under the S-Lab License. See LICENSE for more information.