BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities
This is the official PyTorch code for the paper:
BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities
Shaozhe Hao1,
Xuantong Liu2,
Xianbiao Qi3*,
Shihao Zhao1,
Bojia Zi4,
Rong Xiao3,
Kai Han1†,
Kwan-Yee K. Wong1†
1The University of Hong Kong 2Hong Kong University of Science and Technology
3Intellifusion 4The Chinese University of Hong Kong
(*: Project lead; †: Corresponding authors)
[Project page] [arXiv] [Colab]
TL;DR: We introduce BiGR, a novel conditional image generation model using compact binary latent codes for generative training, focusing on enhancing both generation and representation capabilities.
🌟 We are training BiGR with REPA, a representation alignment regularization that enhances both generation and representation performance in DiT/SiT.
You can simply install the environment with the file environment.yml
by:
conda env create -f environment.yml
conda activate BiGR
Please first download the pretrained weights for tokenizers and BiGR models to run our tests.
We train Binary Autoencoder (B-AE) by adapting the official code of Binary Latent Diffusion. We provide pretrained weights for different configurations.
256x256 resolution
B-AE | Size | Checkpoint |
---|---|---|
d24 | 332M | download |
d32 | 332M | download |
512x512 resolution
B-AE | Size | Checkpoint |
---|---|---|
d32-512 | 315M | download |
We provide pretrained weights for BiGR models in various sizes.
256x256 resolution
Model | B-AE | Size | Checkpoint |
---|---|---|---|
BiGR-L-d24 | d24 | 1.35G | download |
BiGR-XL-d24 | d24 | 3.20G | download |
BiGR-XXL-d24 | d24 | 5.92G | download |
BiGR-XXL-d32 | d32 | 5.92G | download |
512x512 resolution
Model | B-AE | Size | Checkpoint |
---|---|---|---|
BiGR-L-d32-res512 | d32-res512 | 1.49G | download |
We provide the sample script for 256x256 image generation in script/sample.sh
.
bash script/sample.sh
Please specify the code dimension $CODE
, your B-AE checkpoint path $CKPT_BAE
, and your BiGR checkpoint path
$CKPT_BIGR
.
You may also want to try different settings of the CFG scale $CFG
, the number of sample iterations $ITER
, and the gumbel temperature $GUMBEL
. We recommend using small gumbel temperature for better visual quality (e.g., GUMBEL=0
). You can increase gumbel temperature to enhance generation diversity.
You can generate 512x512 images using script/sample_512.sh
. Note that you need to specify the corresponding 512x512 tokenizers and models.
bash script/sample_512.sh
BiGR supports various zero-shot generalized applications, without the need for task-specific structural changes or parameter fine-tuning.
You can easily download testing images and run our scripts to get started. Feel free to play with your own images.
bash script/app_inpaint.sh
bash script/app_outpaint.sh
You need to save the source image and the mask in the same folder, with the image as a *.JPEG
file and the mask as a *.png
file.
You can then specify the source image path $IMG
.
You can customize masks using this gradio demo.
bash script/app_edit.sh
In addition to the source image path $IMG
, you also need to give a class index $CLS
for editing.
bash script/app_interpolate.sh
You need to specify two class indices $CLS1
and $CLS2
.
bash script/app_enrich.sh
You need to specify the source image path $IMG
.
You can train BiGR yourself by running:
bash script/train.sh
You need to specify the ImageNet-1K dataset path --data-path
.
We train L/XL-sized models using 8 A800 GPUs and XXL-sized models using 32 A800 GPUs on 4 nodes.
This project builds on Diffusion Transformer, Binary Latent Diffusion, and LlamaGen. We thank these great works!
If you use this code in your research, please consider citing our paper:
@misc{hao2024bigr,
title={Bi{GR}: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities},
author={Shaozhe Hao and Xuantong Liu and Xianbiao Qi and Shihao Zhao and Bojia Zi and Rong Xiao and Kai Han and Kwan-Yee~K. Wong},
year={2024},
}