This repository contains a PyTorch implementation of our ICCV 2021 paper Image Synthesis from Layout with Locality-Aware Mask Adaptation.
This paper is concerned with synthesizing images conditioned on a layout (a set of bounding boxes with object categories). Existing works construct a layout-mask image pipeline. Object masks are generated separately and mapped to bounding boxes to form a whole semantic segmentation mask (layout-to-mask), with which a new image is generated (mask-to-image). However, overlapped boxes in layouts result in overlapped object masks, which reduces the mask clarity and causes confusion in image generation.
We hypothesize the importance of generating clean and semantically clear semantic masks. The hypothesis is supported by the finding that the performance of state-of-the-art LostGAN decreases when input masks are tainted. Motivated by this hypothesis, we propose Locality-Aware Mask Adaption (LAMA) module to adapt overlapped or nearby object masks in the generation. Experimental results show our proposed model with LAMA outperforms existing approaches regarding visual fidelity and alignment with input layouts. On COCO-stuff in 256×256, our method improves the state-of-the-art FID score from 41.65 to 31.12 and the SceneFID from 22.00 to 18.64.
The environment is tested on Ubuntu 16.04 with CUDA 10.01 and NVIDIA RTX 2080 TI. The code is written in PyTorch 1.6, and the requirements of conda environment are provided in LAMA.yaml, LAMA_tf.yaml and LAMA_YOLO.yaml.
We provide pre-trained models of COCO and VG in Google Drive and
Please put all pretrained models under pretrained_models/
Create an environment in conda
conda env create -f LAMA_tf.yaml
conda env create -f LAMA.yaml
conda activate LAMA
pip install tensorboardX pycocotools
python build develop
Download COCO dataset to datasets/coco
bash scripts/
Download VG dataset to datasets/vg
bash scripts/
python scripts/
The training process uses PyTorch DataDistributedParallel module.
conda activate LAMA
export CUDA_VISIBLE_DEVICES=0; python -m torch.distributed.launch --nproc_per_node=1 --img_size 128 --batch_size 20 --out_path experiment/coco_128/
With multiple GPUs, the training command can be
export CUDA_VISIBLE_DEVICES=0,1,2,3; python -m torch.distributed.launch --nproc_per_node=4 --img_size 128 --batch_size 20 --out_path experiment/coco_128/
We provide examples to use the pretrained model and to calculate the evaluation metrics.
python --dataset coco --model_path pretrained_models/coco_128.pth --sample_path samples/ --gpu 1
conda activate LAMA_tf
python scores/ samples/coco128_repeat5_thres2.0/ --gpu 0
The validation images are extracted, with which the
conda activate LAMA
python utils/ --dataset coco --img_size 128
conda activate LAMA_tf
python scores/ datasets/coco/val_128/ samples/coco128_repeat5_thres2.0/ --gpu 0 --lowprofile
conda activate LAMA
python --dataset coco --model_path pretrained_models/coco_128.pth --DS -r 2 -N --img_size 128 --gpu 0
We first extract cropped objects from the dataset and generate object crops. Then the SceneFID is computed.
conda activate LAMA
python utils/ --dataset coco --img_size 128 --cropped_size 224
python --dataset coco --model_path pretrained_models/coco_128.pth --img_size 128 -N --cropped_size 224 --sample_path samples/cropped_224/ --gpu 0
conda activate LAMA_tf
python scores/ datasets/coco/val_128_cropped_224 samples/cropped_224/ --gpu 0 --lowprofile
We use the implementation of classification from The validation accuracy in the last epoch is taken as CAS score.
conda activate LAMA
git clone
cd pytorch_image_classification
git clone
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
pip install thop==0.0.31.post2004070130
pip install fvcore termcolor yacs
cd ../..
Generate training and testing sets.
conda activate LAMA
python utils/ --dataset coco --img_size 128 --cropped_size 32
python --dataset coco --model_path pretrained_models/coco_128.pth --img_size 128 -N --cropped_size 32 --sample_path samples/cropped_32/ --gpu 0
Run classification.
cd pytorch_image_classification
mkdir coco_128
cd coco_128
ln -s ../../datasets/coco/val_128_cropped_32/ val
ln -s ../../samples/cropped_32/coco128_repeat5_thres2.0_cropped_32/ train
cd ..
mkdir experiments
sed -i '/macs/d'
sed -i '/n_params/d'
python --config configs/cifar/resnet.yaml ImageNet dataset.dataset_dir coco_128/ train.output_dir experiments/coco_128/ dataset.n_classes 184
cd ..
conda activate LAMA
python --dataset coco --model_path pretrained_models/coco_128.pth --sample_path samples/ -r 1 --image_id_savepath image_id.txt
cd yolo_experiments
conda env create -f LAMA_YOLO.yaml
git clone
Ground truth
cp ../datasets/coco/annotations/instances_val2017.json data
conda activate LAMA_YOLO
cd data
git clone
cd cocoapi/PythonAPI
python install
cd ../../..
The terminal goes back to yolo_experiments/
in the last line.
In the –-image_path is the path of image and --imageid_path is the order of generated pictures
cd data
conda activate LAMA_YOLO
ln -s ../../datasets/coco/val2017/ val2017
python --imageid_path ../../image_id.txt --image_path ../../samples/coco128_repeat1_thres2.0
Notice we use image_id.txt to specify the validation layout of the generated images. The generated images are named sample_0.jpg, sample_1.jpg, and so on, which is consistent with the order in image_id.txt.
This paper is supported by the National Science and Technology Innovation 2030 Major Project (2018AAA0100703) of the Ministry of Science and Technology of China, the National Natural Science Foundation of China (61773336, 62006208), and the Provincial Key Research and Development Plan of Zhejiang Province (2019C03137). Zejian Li would like to thank Pei Chen, Yongxing He in Zhejiang University for helpful comments, and Wei Sun for the kindness to answer questions regarding LostGANs.
- LostGAN:
- Image Generation from Scene Graphs:
- Faster R-CNN and Mask R-CNN in PyTorch 1.0:
- YOLOv4:
- CAS:
author = {Zejian Li and Jingyu Wu and Immanuel Koh and Yongchuan Tang and Lingyun Sun},
title = {Image Synthesis from Layout with Locality-Aware Mask Adaption},
year = {2021},
publisher = {IEEE},
pages = {13819--13828}
booktitle = {IEEE International Conference on Computer Vision (ICCV)}