Implementation of StyleGAN2 as final project for subject Deep Learning @ FRI, University of Ljubljana, 2022/23.

Deep learning project - StyleGAN2

This is the source code for a project at course Deep Learning, regarding the re-implementation and testing of StyleGAN2 paper, "Analyzing and Improving the Image Quality of StyleGAN". The implementation is done using PyTorch.

The goal is to first implement, train and evaluate StyleGAN2, using the popular FID metric. Then, the idea (hope) is to implement the improvement of StyleGAN2 - StyleGAN2-ADA.


If you intend to run anything from this repository, make sure the path to the directory where you cloned it is added to PYTHONPATH, if you are on Windows. It seems to work fine without that on Linux.

Repository structure

A brief overview of the structure of the repository:

├── celeba/                         # Dataset. Download instructions below
│   └── img_align_celeba
│       ├── 000001.jpg
│       ├── 000002.jpg
│       ├── 000003.jpg
│       └── ...
├── discriminator_utils/            # Discriminator layers
│   ├──      # Residual discriminator block (2 3x3 conv layers, residual 1x1 conv)
│   └──                 # FromRGB layer
├── general_utils/                  # General utility functions
│   ├──                # Linear and convolution layers with equalized learning rate
│   ├──          # Function that generates the noise input for each generator block
│   ├──                   # Function for setting up a logger
│   ├──                   # Non-saturating logistic loss (original GAN), R1 and path length regularizations
│   ├──                    # Proxy function that shows hints when invoking the forward function of a nn.Module
│   └──                 # Upsampling and downsampling operations using FIR filter smoothing
├── generator_utils/                # Generator layers
│   ├──          # Skip generator block (2 3x3 convs with weight demodulation, ToRGB)
│   └──                   # ToRGB layer
├── model/
│   ├──                  # Function to return train (and optional test) DataLoader
│   ├──            # Discriminator put together
│   ├──                # Generator put together
│   ├──          # Mapping network + truncation trick
│   └──                    # Put entire model together, hyperparameters, train loop, checkpoints, ...
├──            # Continue training from a pretrained model
├──                     # Generate images using a pretrained model for evaluating FID
├──               # Resize test CelebA images to 64x64 for evaluating FID
└──                 # Train a model from scratch. Uses CUDA automatically if available.

I tried to partition the repository, and the code itself, as neatly as possible. If you happen to be reading this, and have any suggestions on how the code/structure could be improved, do let me know.

If you're wondering why resizes CelebA images from the test split to 64x64 - it's because the models I trained only go up to 64x64 resolution, so I wanted an "even playing field" when evaluating FID.

CelebA dataset

The dataset used is CelebA, which is available in PyTorch. However, if you try to download it by setting download=True, you may (and most probably will) get a RuntimeError: The daily quota of the file is exceeded and it can't be downloaded. This is a limitation of Google Drive and can only be overcome by trying again later.

To avoid having to wait, I suggest going to the official Google Drive for CelebA, and downloading the following 6 files:

  • From Img
  • From Eval
    • list_eval_partition.txt
  • From Anno
    • list_landmarks_align_celeba.txt
    • list_bbox_celeba.txt
    • list_attr_celeba.txt
    • identity_CelebA.txt

Place all of them inside the folder celeba, following the repository structure. Then, simply let PyTorch and model/ do the rest.


All of the models I tested used the following hyperparameters:

num_training_images         = 70000         # Number of CelebA training images
batch_size                  = 32            # Batch size
dim_latent                  = 512           # Dimensionality of latent variables `z` and `w`
adam_betas                  = (0.0, 0.99)   # Betas for Adam optimizer
gamma                       = 10            # Gradient penalty coefficient gamma
use_loss_regularization     = True          # Use R1 (gradient penalty) and path length regularization
checkpoint_interval         = 1000          # How often to save a checkpoint
generate_progress_images    = True          # Whether to also generate some images every `checkpoint_interval` steps

The one model that worked used gan_lr = 0.002 (learning rate for generator and discriminator), mapping_network_lr = 0.0002 (learning rate for the mapping network), and gradient_accumulate_steps = 4 (how many steps to accumulate gradients for). It was trained for 10000 steps. It is available on this OneDrive link, with the name stylegan2-3idx-10000steps.pth, if you want to download it.

By "worked", I mean that the model didn't experience mode collapse and was able to generate images of something that (at least at 10000 steps) resembles a face. Here's an example of a 4x4 grid generated by that model.

oh noes

I tried using the truncation trick introduced back in StyleGAN, but I'm not really sure if it helped. The above generated image was with truncation_psi=0. The implementation of it could be wrong, or (more likely) I'm missing something else. This is how truncation should look like.

FID score

For evaluating FID, I first took the test split of the CelebA dataset (19962 images) and resized them all to 64x64. This can be done with script. The resized images are placed in celeba_test_64x64 by default.

Then, with, I generated the same amount of images using the aforementioned trained model, namely stylegan2-3idx-10000steps.pth. The generated images will be placed under generated_3idx_10000steps by default.

The model managed to achieve a FID score of 38.365, which isn't necessarily the greatest result, but it's not all that terrible.

I also tried training the model for an additional 10000 steps (stylegan2-3idx-20000steps.pth in the aforementioned OneDrive link). I repeated the same procedure, and that model managed to achieve a FID score of 23.937, which is a rather significant improvement. Here's an example of a 4x4 grid generated by that model. The faces are slightly clearer than the less trained model.

oh noes

I tried training the model up to 30000 and 40000 steps, but didn't get much of an improvement. FID scores were 22.957 and 21.229, respectively.

What didn't work

Hyperparameters that didn't work (all of these models experienced mode collapse):

  • gan_lr = 0.001, mapping_network_lr = 0.0001, gradient_accumulate_steps = 1
  • gan_lr = 0.001, mapping_network_lr = 0.001, gradient_accumulate_steps = 4
  • gan_lr = 0.0001, mapping_network_lr = 0.0001, gradient_accumulate_steps = 4
  • gan_lr = 0.0002, mapping_network_lr = 0.0002, gradient_accumulate_steps = 1
  • gan_lr = 0.002, mapping_network_lr = 0.0002, gradient_accumulate_steps = 1 - this one didn't mode collapse as bad as others but it still did
  • gan_lr = 0.0002, mapping_network_lr = 0.0002, gradient_accumulate_steps = 4
  • gan_lr = 0.002, mapping_network_lr = 0.0002, gradient_accumulate_steps = 8
  • gan_lr = 0.002, mapping_network_lr = 0.0002, gradient_accumulate_steps = 4, adam_betas = (0.5, 0.99)


To anyone viewing this repository, I should note that this was my first experience with GANs in general, so I advise taking the hyperparameters and the training loop with a slight grain of salt, as training GANs is (as I have experienced) quite a difficult task. There are certainly areas where improvements are possible.

However, I should also point out that I invested a lot of effort browsing through 5 different repositories regarding StyleGAN(2, 2-ADA) implementations, and reading through the StyleGAN papers trying to at the very least match the structure of all individual layers. I think the structure is for the most part okay, but again, this was my first time working with anything related to GANs, so there may be some errors or oversights.


Implementation of StyleGAN2 as final project for subject Deep Learning @ FRI, University of Ljubljana, 2022/23.







