"To infinity and beyond!"
Chieh Hubert Lin1, Hsin-Ying Lee2, Yen-Chi Cheng3, Sergey Tulyakov2, Ming-Hsuan Yang1,4
1UC Merced, 2Snap Research, 3CMU, 4Google Research
Abstract (click to view)
We present a novel framework, InfinityGAN, for arbitrary-sized image generation. The task is associated with several key challenges. First, scaling existing models to an arbitrarily large image size is resource-constrained, in terms of both computation and availability of large-field-of-view training data. InfinityGAN trains and infers in a seamless patch-by-patch manner with low computational resources. Second, large images should be locally and globally consistent, avoid repetitive patterns, and look realistic. To address these, InfinityGAN disentangles global appearances, local structures, and textures. With this formulation, we can generate images with spatial size and level of details not attainable before. Experimental evaluation validates that InfinityGAN generates images with superior realism compared to baselines and features parallelizable inference. Finally, we show several applications unlocked by our approach, such as spatial style fusion, multi-modal outpainting, and image inbetweening. All applications can be operated with arbitrary input and output sizes.[Project Page] [Paper] [Supplementary]
(*These samples are downsampled, please access the raw images via [Google Drive](https://drive.google.com/drive/folders/1Ej3dgWVagitJR7FYtrlEWu3DSC2cOvJi?usp=sharing))- A. Configure Environment
- B. Prepare Data
- C. Train Model
- D. Test Model
- E. Interactive Generation
- F. Evaluation
- G. Pretrained Models And Additional Materials
Our repository works on Ubuntu. (One of our machine setups: Ubuntu + Python 3.8.5 + cudatoolkit 10.2
)
Setup:
- Create conda environement with
conda env create --name pt16 --file meta_data/environment.yml
. We only tested our pipeline on PyTorch 1.6. Please avoid using PyTorch 1.7 and 1.8 as we observe an awkward degradation in performance. - (Alternative) Directly install with
conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.2 -c pytorch
,conda install python-lmdb tqdm matplotlib imageio scikit-image scikit-learn scipy=1.5
andpip install tensorboardx==2.1 pyyaml==5.4.1 easydict
. - Designate a directory where you are going to place all your lmdb dataset in
env_config.py
.
P.S. Theoretically this repository should be workable on Windows if you manage to run StyleGAN2, which requires extra efforts in dealing with cuda codes building with Visual Studio.
Notes: We originally use "Flickr-Landscape (small)" in the V1 paper on Arxiv. We then update the results of all models to "Flickr-Landscape (large)" in the later versions of the paper. We only use the training split for all training and FID evaluation. Nevertheless, we still provide a validation set in "Flickr-Landscape (large)". Notice that the Flickr-Landscape dataset contains images at different sizes without aligning them in the lmdb, so you may add customized training augmentations if desired.
- Use prepared lmdb:
Dataset | Used in latest paper | # images | Minimum image size | All images same shape? | Size | Has holdout set? | Link |
---|---|---|---|---|---|---|---|
Flickr-Landscape (small) | X | 50,000 | 1024 | X | 89G | X | (Google Drive) |
Flickr-Landscape (large) | V | 400,000 | 1024 | X | 786G | V | (Google Drive) |
Flickr-Scenery | V | 54,710 | 256 | V | 3.5G | V | (Will release via In&Out) |
Places2-Scenery-Subset | V | 56,431 | 256 | V | 3.2G | V | (Will release via In&Out) |
- Construct your custom dataset
- Prepare a config similar to
configs/dataset/flickr-landscape-small.yaml
. - Run (we only use training set)
python prepare_data.py ./configs/dataset/flickr-landscape-small.yaml --train_only
- The lmdb will be constructed at the
LMDB_ROOTS
you specified inenv_config.py
. - Remember to modify the
data_params.dataset
flag in your training config when you train the model.
- Prepare a config similar to
Our pipeline requires specifying CUDA_VISIBLE_DEVICES
, and automatically switch to dataparallel if two or more GPUs are specified.
- InfinityGAN:
CUDA_VISIBLE_DEVICES="0" python train.py ./configs/model/InfinityGAN.yaml
- StyleGAN2 + NCI:
CUDA_VISIBLE_DEVICES="0" python train.py ./configs/model/StyleGAN2_NCI.yaml
- StyleGAN2 + NCI + FCG
CUDA_VISIBLE_DEVICES="0" python train.py ./configs/model/StyleGAN2_NCI_FCG.yaml
Misc flags of train.py
:
--debug
: With this flag, the training pipeline will run one iteration training, execute all logging and evaluation for one iteration, then quit without writing any thing to your logs. Sometimes you may just want to test your environment or config without writing any thing.--archive-mode
: Our pipeline automatically backups your codes at./logs/<exp_name>/codes/
. You may run the model training within that folder by using this flag.
Our pipeline requires specifying CUDA_VISIBLE_DEVICES
, and automatically switch to dataparallel if two or more GPUs are specified.
Suppose with a model trained with a config ./configs/model/<OuO>.yaml
, you want to generate images at HxW
resolution. the testing configs are written as follow:
-
Naive Generation
Directly synthesize the whole image.O(H*W)
memory allocation.CUDA_VISIBLE_DEVICES="0,1" python test.py \ --model-config=./configs/model/<OuO>.yaml \ --test-config=./configs/test/direct_gen_HxW.yaml
-
Infinite Generation
Sequentially generate patches.O(1)
memory allocation.CUDA_VISIBLE_DEVICES="0,1" python test.py \ --model-config=./configs/model/<OuO>.yaml \ --test-config=./configs/test/infinite_gen_HxW.yaml
-
Spatial Fusion Generation
Spatially fuses multiple styles. Follows the "infinite generation" design.CUDA_VISIBLE_DEVICES="0,1" python test.py \ --model-config=./configs/model/<OuO>.yaml \ --test-config=./configs/test/fused_gen_HxW.yaml
-
Inversion
Please rememeber to updateoverride_dataset_data_size
andoverride_dataset_full_size
if the inversion real image resolution is different from the training resolution.CUDA_VISIBLE_DEVICES="0" python test.py \ --model-config="./configs/model/<OuO>.yaml" \ --test-config="./test_configs/inversion_<???>.yaml"
-
Outpainting Invert the latent variables, and outpaint the image.
# Run inversion first CUDA_VISIBLE_DEVICES="0" python test.py \ --model-config="./configs/model/<OuO>.yaml" \ --test-config="./test_configs/inversion_256x256_L2R.yaml" # Then outpaint CUDA_VISIBLE_DEVICES="0" python test.py \ --model-config="./configs/model/<OuO>.yaml" \ --test-config="./test_configs/outpaint_with_fused_gen_256x256.yaml" \ --inv-records="./logs/<OuO>/test/outpaint_with_fused_gen_256x256/stats/<id>.pkl" \ --inv-placements=0.5,0.25
-
Inbetweening Invert the latent variables, and outpaint the image.
# Run inversion first CUDA_VISIBLE_DEVICES="0" python test.py \ --model-config="./configs/model/<OuO>.yaml" \ --test-config="./test_configs/inversion_IOF246_256x1280L_256x128.yaml" CUDA_VISIBLE_DEVICES="0" python test.py \ --model-config="./configs/model/<OuO>.yaml" \ --test-config="./test_configs/inversion_IOF246_256x1280R_256x128.yaml" # Then outpaint (the `inv-records` and `inv-placements` are ordered lists separated with `:`) CUDA_VISIBLE_DEVICES="0" python test.py \ --model-config="./configs/model/<OuO>.yaml" \ --test-config="./test_configs/inbetween_with_fused_gen_256x1280.yaml" \ --inv-records="./logs/<OuO>/test/inversion_IOF246_256x1280L_256x128/stats/<id>.pkl:./logs/<OuO>/test/inversion_IOF246_256x1280R_256x128/stats/<id>.pkl" \ --inv-placements=0.5,0.05:0.5,0.95
P.S. As (i) the inversion area of the real image, (ii) the inversion area of the generated image, and (iii) the position of the inverted latents while outpainting can be different (as well as some further technical difficulties). Unfortunately, you need to invert the latent variables each time you change either the inversion area size, the position of the inversion area, or the outpainting target resolution.
lowres_height
: High-resolution images are hard to download from remote, we additionally save a low-resolution version of the images by aspect-ratio downsampling the images to the specified height.interactive
: See below.parallel_batch_size
: The "parallel batching" application mentioned in the paper.
- Supports
test_manager.infinte_generation
andtest_manager.fused_generation
. - The implementation is at:
test_manager.base_test_manager.py:maybe_parallel_inference()
. - Despite
batch_size
can be simultaneously supported, we make them mutually exclusive as the mixing use of these two batching strategies is not meaningful.
- Supports
--speed-benchmark
: Collects GPU execution time (includes dataparallel scatter and collection time). Ignores the first-ten iterations.--calc-flops
: Get the total FLOPs used in synthesizing a full image.
Set interactive: True
in the config, or equivalently use --interactive
in the command to test.py
.
The interactive generation is supported for the following test_manager
classes:
- (for infinite generation)
test_manager.infinte_generation
- (for spatial fusion generation, outpainting, and inbetweening)
test_manager.fused_generation
How to use:
- Selection: Left-click on the image two times to create a red selection bounding box that designates an area to re-sample. Right-click on the image two times to create a blue selection bounding box that designates an area to extract channel-wise statistics for the re-sampling mean and standard deviation (default is zero-mean and unit-variance if no blue boxes are selected).
- Sampling: Select the variables to resample. Only the spatially shaped latent variables (e.g., local latent and noises) can be regionally sampled with selection area.
- Undo/redo: Supports upto 100 steps of undo/redo. You may increase the value if you want and have a sufficient amout of CPU memory.
Note If you find the image is too large (such as 4096x4096 does not fit into your monitor at all), you can increase the self.fov_rescale
to 2 or 4, which downsamples the image before displaying in the canvas (but you are still interacting with the image at its original image).
P.S. To quit the program, you need to close the interface window and kill (ctrl-c) the program in the terminal.
To test the model with x2 ScaleInv FID:
CUDA_VISIBLE_DEVICES="0,1" python eval_fids.py \
./configs/model/<exp_name>.yaml \
--type=scaleinv \
--scale=2 \
--batch-size=2
Other arguments
--ckpt
: By default, we test the checkpoint at./logs/<exp_name>/ckpt/best_fid.pth.tar
. You may override the path with this argument if you want to test other checkpoints.--img-folder
: You may use this in case you want to test with a folder with images.--type
: We also implemented another FID schemaspatial
, which partitions the image into 16(=4x4) patches, extract Inception features for each patch, and concatenate them into a plain vector. This is much slower and consumes massive CPU memory. And the trend (FID v.s. scale) is similar to ScaleInv FID.--seq-inference
: For InfinityGAN, due to the additional structure synthesizer, the model can OOM at higher resolution if generating the image at one-shot. You may use this flag to enable sequential inference (i.e., usestest_managers.infinite_generation.py
). But this will slow down the inference due to some internal redundant computations.
(This script is based on the codes from In&Out)
Please run the inversion first. It will store results (images and inverted variables) at ./logs/<exp_name>/test/<test_name>/
, then you can evaluate with the following command:
CUDA_VISIBLE_DEVICES="0" python eval_outpaint_imgdir.py \
--batch=48 \
--size=256 \
--real-dir=./logs/<exp_name>/test/<test_name>/imgs/real_gt/ \
--fake-dir=./logs/<exp_name>/test/<test_name>/imgs/inv_cmp/
Note that this script only supports single GPU.
You should structure the ./logs/
folder like this:
logs/ --+--- exp_name_A/ --- ckpt/ --- best_fid.pth.tar
|
+--- exp_name_B/ --- ckpt/ --- best_fid.pth.tar
|
+--- exp_name_C/ --- ckpt/ --- best_fid.pth.tar
|
(...)
You should be able to find corresponding config for each of the released model under ./configs/model/
. You can run testing the model with:
CUDA_VISIBLE_DEVICES="0,1" python test.py \
--model-config=./config/model/<exp_name>.yaml \
--test-config=./configs/test/infinite_gen_1024x1024.yaml
The test script will auto-detect the checkpoint at ./logs/<exp_name>/ckpt/best_fid.pth.tar
Name | Dataset | Used in paper | Training full image size | Training patch size | Trained w/ #GPUs | Link |
---|---|---|---|---|---|---|
InfinityGAN | Flickr-Landscape (large) | V | 197 | 101 | 1x TitanX | (Google Drive) |
InfinityGAN-HR | Flickr-Landscape (large) | X | 389 | 197 | 4x V100 | (Google Drive) |
InfinityGAN-UR | Flickr-Landscape (large) | X | 1024 | 773 | 4x V100 | (Google Drive) |
InfinityGAN-IOF | InOut-Flickr-Scenery | V | 197 | 101 | 1x TitanX | (Google Drive) |
InfinityGAN-IOP | InOut-Places2-Scenery-subset | V | 197 | 101 | 1x TitanX | (Google Drive) |
Inverting a large set of samples requires a large amount of computation. In order to save your time and our earth (just a bit), we release the inversion results here.
The tar file (decompress with tar zxf <filename>.tar
) contains following folders:
---+---inv_cmp/ : Compare (left-half) real and (right-half) reconstruction via inversion.
|
+---inv_comp_cropped/ : Composed (left-half) real and (right-half) outpainting via inversion.
|
+---inv_raw/ : The whole inverted image.
|
+---real_gt/ : The real data.
Note: You may notice that there is a cropped in the folder name. InfinityGAN actually inverts images slightly larger than the conditional image, then crop those area away in the end.
- The performance on PyTorch 1.4/1.7 and PyTorch 1.6 are different. The root cause is unknown and still a misc event to us, so please use PyTorch 1.6 if possible.
- Please do not use dataparallel on two different types of GPUs (e.g., data parallel with GTX1080 + GTX2080), one of the GPUs may generate gray or blank images.
- OOM while training with a single GPU. PyTorch can sometimes raise OOM due to unfortunate memory allocations. Here are some tweaks that sometimes resolves the problem (if you indeed have only one GPU):
- [Reminder] We use "TITAN X (Pascal)" with 12196 MB GPU memery in the single-GPU setup in our paper. We are not certain about the results on other GPUs with less memory.
- Set
torch.backends.cudnn.benchmark
toFalse
intrain.py
. Despite it was designed not to produce OOM, but somehow it sometimes unfortunately makes it. - Set
calc_fid
andcalc_fid_ext2
toFalse
. The evaluation allocates additional memory and uses different memory allocation pattern, which can mess up the PyTorch memory allocation schedule. You may directly use the last iteration, as all models converge well. - Set
ext_mult_list
to[2,]
instead of[2, 4]
, which stops logging images generated at 4x testing resolution during training. - Reduce training
batch_size
smaller. However, it may influence the model performance.
This repository borrows codes from different sources, please follow the user licenses from each of the source while using this project.
Notice that the code release aims to support the open-source culture in computer vision research community, facilitating the research efficiency and keeping the community open to new comers. The first author strongly discourage research groups that do not match such an open-source culture to use or read any piece of codes in this repository. Please keep the close-door culture bidirectional.
The implementation heavily borrows from StyleGAN2-Pytorch, pytorch-fid and PerceptualSimilarity.
@inproceedings{
lin2021infinity,
title={Infinity{GAN}: Towards Infinite-Pixel Image Synthesis},
author={Lin, Chieh Hubert and Cheng, Yen-Chi and Lee, Hsin-Ying and Tulyakov, Sergey and Yang, Ming-Hsuan},
booktitle={International Conference on Learning Representations},
year={2022},
url={https://openreview.net/forum?id=ufGMqIM0a4b},
}