Home | PyTorch BigGAN Discovery | TensorFlow ProGAN Regularization | PyTorch Simple GAN Experiments
This repo contains a PyTorch implementation of direction discovery for BigGAN using OroJaR. The code is based on the Hessian Penalty, we thank the authors for their excellent work.
Follow the simple setup instructions here. The pytorch version we have used to train the models is pytorch1.7.1.
Make sure you are using a recent version of PyTorch (>= 1.6.0); otherwise, you may have trouble loading our checkpoint directions.
Our visualization and training scripts automatically download a pre-trained BigGAN checkpoint for you, or you can download BigGAN model from Google Drive and put them into ./checkpoints dir.
This repo comes with pre-trained directions from the golden retrievers and churches experiments in our paper; see the checkpoints/directions/orojar
directory. To generate videos showcasing each learned direction, run one of the scripts in scripts/visualize/orojar
(e.g., scripts/visualize/orojar/vis_goldens_coarse.sh
). This will generate several videos demonstrating each of the learned directions. Each row corresponds to a different direction, and each column applies that direction to a different sampled image from the generator. For comparison, we also include pre-trained BigGAN directions from the GAN Latent Discovery repo and Hessian Penalty repo; run scripts/visualize/vis_voynov.sh
or scripts in scripts/visualize/hessian
to visualize those.
You can add several options to the visualization command (see utils.py
for a full list):
-
--path_size
controls how "much" to move in the learned directions -
--directions_to_vis
can be used to visualize just a subset of directions (e.g.,--directions_to_vis 0 5 86
) -
--fix_class
, if specified, will only sample images from the given ImageNet class (you can find a mapping of class indices to human-readable labels here) -
--load_A
controls which directions checkpoint to load from; you can set it torandom
to visualize random orthogonal directions,coords
to see what each individual z-component does, or set it to your own learned directions to visualize them -
--val_minibatch_size
controls the batching for generating the videos; decrease this if you have limited GPU memory
Note that BigGAN, by default, has quite a bit of innate disentanglement between the latent z vector and the class label. This means the directions tend to generalize well to other classes, so feel free to use a different --fix_class
argument for visualizing samples of other categories in addition to categories you used for training.
To start direction discovery, you can run one of the scripts in scripts/discover/orojar
(e.g., discover_coarse_goldens.sh
, discover_mid_goldens.sh
, etc.). This will launch orojar_discover.py
which learns a matrix of shape (ndirs, dim_z)
, where ndirs
indicates the number of directions being learned.
There are several training options you can play with (see utils.py
for a full list):
-
--G_path
can be set to a pre-trained BigGAN checkpoint to run discovery on (if set to the default valueNone
, we will download a 128x128 model automatically for you) -
--A_lr
controls the learning rate -
--fix_class
, if specified, will restrict the sampled class input to the generator to the specified ImageNet class index. In our experiments, we restricted it to either207
(golden retrievers) or497
(churches), but you can try setting this argument toNone
and sampling classes randomly during training as well. -
--ndirs
specifies the number of directions to be learned -
--no_ortho
can be added to learn an unconstrained matrix of directions (by default, the directions are constrained to be orthonormal to prevent degenerate solutions) -
--search_space
by default is set to'all'
, which searches for directions in the entirety of z-space (which by default is 120-dimensional). You can instead set--search_space coarse
to search for directions in just the first 40 z-components,--search_space mid
to search in the middle 40 z-components or--search_space fine
to search in the final 40 z-components (the settings we used for the experiments reported in our paper). This is in a similar spirit as "style mixing" in StyleGAN, where it is often beneficial to take advantage of the natural disentanglement learned by modern GANs. For example, the first 40 z-components in vanilla BigGAN mostly correspond with factors of variation related to object pose while the middle 40 z-components mainly control factors such as lighting and background. You can use this argument to take advantage of this natural disentanglement. -
--wandb_entity
can be specified to enable logging to Weights and Biases (otherwise uses TensorBoard) -
--vis_during_training
can be added to periodically log learned direction GIFs to WandB/TensorBoard -
--batch_size
can be decreased if you run out of GPU memory (in our experiments, we used 2 GPUs with a batch size of 32)
Below are the indices for the directions we reported . You can use --directions_to_vis <indices>
to visualize selected directions.
- Rotation: 0
- Zoom: 7
- Shift: 9
- Colorization: 3
- Lighting: 6
- Object Lighting: 4
- Red Color Filter: 1
- Brightness: 5
- White Color Filter: 13
- Saturation: 20
- Rotation: 0
- Zoom: 7
- Smoosh: 9
- Background Removal: 0
- Scene Lighting: 8
- Object Lighting: 2
- Colorize: 21
- Red Color Filter: 5
- Brightness: 4
- Green Color Filter: 34
- Saturation: 17
If our code aided your research, please cite our paper:
@InProceedings{Wei_2021_ICCV,
author = {Wei, Yuxiang and Shi, Yupeng and Liu, Xiao and Ji, Zhilong and Gao, Yuan and Wu, Zhongqin and Zuo, Wangmeng},
title = {Orthogonal Jacobian Regularization for Unsupervised Disentanglement in Image Generation},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2021},
pages = {6721-6730}
}
This repo builds upon Hessian Penalty and Andy Brock's PyTorch BigGAN library. We thank the authors for open-sourcing their code. The original license can be found in Hessian LICENSE and BigGAN LICENSE.