BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities

This is the official PyTorch code for the paper:

BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities
Shaozhe Hao¹, Xuantong Liu², Xianbiao Qi³*, Shihao Zhao¹, Bojia Zi⁴, Rong Xiao³, Kai Han¹†, Kwan-Yee K. Wong¹†
¹The University of Hong Kong ²Hong Kong University of Science and Technology
³Intellifusion ⁴The Chinese University of Hong Kong
(*: Project lead; †: Corresponding authors)

[Project page] [arXiv] [Colab]

TL;DR: We introduce BiGR, a novel conditional image generation model using compact binary latent codes for generative training, focusing on enhancing both generation and representation capabilities.

📢 News

🌟 We are training BiGR with REPA, a representation alignment regularization that enhances both generation and representation performance in DiT/SiT.

⚙️ Setup

You can simply install the environment with the file environment.yml by:

conda env create -f environment.yml
conda activate BiGR

🔗 Download

Please first download the pretrained weights for tokenizers and BiGR models to run our tests.

Binary Autoencoder

We train Binary Autoencoder (B-AE) by adapting the official code of Binary Latent Diffusion. We provide pretrained weights for different configurations.

256x256 resolution

B-AE	Size	Checkpoint
d24	332M	download
d32	332M	download

512x512 resolution

B-AE	Size	Checkpoint
d32-512	315M	download

BiGR models ✨

We provide pretrained weights for BiGR models in various sizes.

256x256 resolution

Model	B-AE	Size	Checkpoint
BiGR-L-d24	d24	1.35G	download
BiGR-XL-d24	d24	3.20G	download
BiGR-XXL-d24	d24	5.92G	download
BiGR-XXL-d32	d32	5.92G	download

512x512 resolution

Model	B-AE	Size	Checkpoint
BiGR-L-d32-res512	d32-res512	1.49G	download

🚀 Image generation

We provide the sample script for 256x256 image generation in script/sample.sh.

bash script/sample.sh

Please specify the code dimension $CODE, your B-AE checkpoint path $CKPT_BAE, and your BiGR checkpoint path $CKPT_BIGR.

You may also want to try different settings of the CFG scale $CFG, the number of sample iterations $ITER, and the gumbel temperature $GUMBEL. We recommend using small gumbel temperature for better visual quality (e.g., GUMBEL=0). You can increase gumbel temperature to enhance generation diversity.

You can generate 512x512 images using script/sample_512.sh. Note that you need to specify the corresponding 512x512 tokenizers and models.

bash script/sample_512.sh

💡 Zero-shot applications

BiGR supports various zero-shot generalized applications, without the need for task-specific structural changes or parameter fine-tuning.

You can easily download testing images and run our scripts to get started. Feel free to play with your own images.

Inpainting & Outpainting

bash script/app_inpaint.sh

bash script/app_outpaint.sh

You need to save the source image and the mask in the same folder, with the image as a *.JPEG file and the mask as a *.png file. You can then specify the source image path $IMG.

You can customize masks using this gradio demo.

Class-conditional editting

bash script/app_edit.sh

In addition to the source image path $IMG, you also need to give a class index $CLS for editing.

Class interpolation

bash script/app_interpolate.sh

You need to specify two class indices $CLS1 and $CLS2.

Image enrichment

bash script/app_enrich.sh

You need to specify the source image path $IMG.

💻 Train

You can train BiGR yourself by running:

bash script/train.sh

You need to specify the ImageNet-1K dataset path --data-path.

We train L/XL-sized models using 8 A800 GPUs and XXL-sized models using 32 A800 GPUs on 4 nodes.

💐 Acknowledgement

This project builds on Diffusion Transformer, Binary Latent Diffusion, and LlamaGen. We thank these great works!

📖 Citation

If you use this code in your research, please consider citing our paper:

@misc{hao2024bigr,
    title={Bi{GR}: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities}, 
    author={Shaozhe Hao and Xuantong Liu and Xianbiao Qi and Shihao Zhao and Bojia Zi and Rong Xiao and Kai Han and Kwan-Yee~K. Wong},
    year={2024},
}

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
apps		apps
bae		bae
gradio		gradio
hparams		hparams
llama		llama
rope		rope
script		script
src		src
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
data.py		data.py
environment.yml		environment.yml
misc.py		misc.py
run_BiGR.ipynb		run_BiGR.ipynb
sample.py		sample.py
train_ddp.py		train_ddp.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities

📢 News

⚙️ Setup

🔗 Download

Binary Autoencoder

BiGR models ✨

🚀 Image generation

💡 Zero-shot applications

Inpainting & Outpainting

Class-conditional editting

Class interpolation

Image enrichment

💻 Train

💐 Acknowledgement

📖 Citation

About

Releases

Packages

Languages

License

haoosz/BiGR

Folders and files

Latest commit

History

Repository files navigation

BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities

📢 News

⚙️ Setup

🔗 Download

Binary Autoencoder

BiGR models ✨

🚀 Image generation

💡 Zero-shot applications

Inpainting & Outpainting

Class-conditional editting

Class interpolation

Image enrichment

💻 Train

💐 Acknowledgement

📖 Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages