We propose Fast text2StyleGAN, a natural language interface that adapts pre-trained GANs for text-guided human face synthesis. Leveraging the recent advances in Contrastive Language-Image Pre-training (CLIP), no text data is required during training. Fast text2StyleGAN is formulated as a conditional variational autoencoder (CVAE) that provides extra control and diversity to the generated images at test time. Our model does not require re-training or fine-tuning of the GANs or CLIP when encountering new text prompts. In contrast to prior work, we do not rely on optimization at test time, making our method orders of magnitude faster than prior work. Empirically, on FFHQ dataset, our method offers faster and more accurate generation of images from natural language descriptions with varying levels of detail compared to prior work.
Examples of text-driven face image synthesis by our proposed method Fast text2StyleGAN. The text prompts are increasingly more detailed. Each image takes about 0.09s to produce.
This repo contains the official implementation of the Fast text2StyleGAN. Besides training code, code for a Streamlit inference GUI is also included. Please refer to streamlit_gui/README.md
for details.
- Clone this repo:
git clone https://github.com/duxiaodan/Fast_text2StyleGAN.git
cd Fast_text2StyleGAN
- Dependencies:
Create a new Conda environment usingenvironment.yml
. Then install CLIP with following commands:
pip install git+https://github.com/openai/CLIP.git
- Download pre-trained model from this link and put the file under
logs/cvae_v7/checkpoints/
. You can changecvae_v7
to the name you like but make sure variabletrial
in functionprepare_models()
from filestreamlit_gui/app.py
matches the new name. - Download config file for the pre-trained model from this link and put the file under
logs/cvae_v7/
. - Our KNN method ("Ours" in the paper) also requires CLIP embeddings of FFHQ dataset. We've pre-computed them and you can download the hdf5 file from this link. Put it under
data/
. Though not our best method, you can also play with the other two methods ("Text Only" and "Text+Image") if you want. - Now you can refer to
streamlit_gui/README.md
for how to launch the Streamlit GUI and start to generate faces!
- To train your own model, put both training data (FFHQ) and testing data (CelebAHQ) under
data/
, in separate folders. Folder structure should be likedata/FFHQ/00000.png
. - The dataloader takes both images and CLIP embeddings therefore you'll also need to pre-compute CLIP embeddings for both the training data and the testing data. We pre-computed CLIP embeddings with
ViT-B/16
encoder for FFHQ and CelebAHQ. You can find them at here and here. - In
configs/paths_config.py
, change the paths so that they matched with your own data files. - Download stylegan2 model pre-trained on FFHQ from this link and put it under
pretrained_models
. In the terminal, run the command below to start training.bash launch_training.sh
- Weights & Biases is also suppported. Just turn on the flag
--use_wandb
- If you want to start from some previous checkpoint, specify its path using the flag
--checkpoint_path
. - If you want to run SLURM sequential jobs, turn on the flag
--sequential
and set--checkpoint_path
to"auto"
.
Our code is adapted from the pSp implementation:
https://github.com/eladrich/pixel2style2pixel
Copyright (c) 2020 Elad Richardson, Yuval Alaluf
License (MIT) https://github.com/eladrich/pixel2style2pixel/blob/master/LICENSE
StyleGAN2 implementation:
https://github.com/rosinality/stylegan2-pytorch
Copyright (c) 2019 Kim Seonghyeon
License (MIT) https://github.com/rosinality/stylegan2-pytorch/blob/master/LICENSE
LPIPS implementation:
https://github.com/S-aiueo32/lpips-pytorch
Copyright (c) 2020, Sou Uchida
License (BSD 2-Clause) https://github.com/S-aiueo32/lpips-pytorch/blob/master/LICENSE
VAE implementation:
https://github.com/AntixK/PyTorch-VAE
Copyright (c) 2020, Anand Krishnamoorthy Subramanian
License (Apache License 2.0) https://github.com/AntixK/PyTorch-VAE/blob/master/LICENSE.md
Please Note: The CUDA files under the StyleGAN2 ops directory are made available under the Nvidia Source Code License-NC
If you use this code for your research, please cite our paper Text-Free Learning of a Natural Language Interface for Pretrained Face Generators:
@InProceedings{ Du_ARXIV_2022,
author = {Du, Xiaodan and Yeh, Raymond A. and Kolkin, Nicholas and Shechtman, Eli and Shakhnarovich, Greg},
title = {Text-Free Learning of a Natural Language Interface for Pretrained Face Generators},
journal={arXiv preprint arXiv:2209.03953,
year = {2022},
}