AiM: Scalable Autoregressive Image Generation with Mamba🐍

Scalable Autoregressive Image Generation with Mamba

💡 What is AiM

The first (as far as we know) Mamba 🐍 based autoregressive image generation model, offering competitive generation quality 💪 with diffusion models and faster inference speed ⚡️.

We also propose a more general form of adaLN, called adaLN-group, which balances parameter count and performance ⚖️. Notably, adaLN-group can be flexibly converted to adaLN and adaLN-single equivalently.

🔔 Update

[2024-08-27] Improved HF integration, now supports from_pretrained for direct model loading.
[2024-08-23] A minor bug in train_stage2.py has been fixed.
[2024-08-23] Code and Model Release.

🚀 Getting Started

Train

Training AiM-B on 16 A800 GPUs takes approximately 16 hours. We also provide wandb logs for reference.

accelerate launch --num_processes=32 --num_machines=... --main_process_ip=... --main_process_port=... --machine_rank=... train_stage2.py --aim-model AiM-XL --dataset /your/data/path/ --vq-ckpt /your/ckpt/path/vq_f16.pt --batch-size 64 --lr 8e-4 --epochs 350

Inference

You can play with AiM in the or:

from aim import AiM

model = AiM.from_pretrained("hp-l33/aim-xlarge").cuda()
model.eval()

imgs = model.generate(batch=8, temperature=1, top_p=0.98, top_k=600, cfg_scale=5)

To reproduce the gFID of AiM, you can use the evaluation script of LlamaGen and set: temperature=1, top_p=1.0, top_k=0, cfg_scale=2.0 for AiM-B, cfg_scale=1.75 for AiM-L or AiM-XL

PS: The first time Mamba runs, it will invoke the triton compiler and autotune, so it may be slow. From the second run onwards, the inference speed will be very fast. See: state-spaces/mamba#389 (comment)

🤗 Model Zoo

The model weights can be downloaded from the .

Model	params	FID	weight
AiM-B	148M	3.52	aim-base
AiM-L	350M	2.83	aim-large
AiM-XL	763M	2.56	aim-xlarge

🌹 Acknowledgments

This project would not have been possible without the computational resources provided by Professor Guoqi Li and his team. We would also like to thank the following repositories and papers for their inspiration: VQGAN, Mamba, LlamaGen, VAR, DiT

📖 BibTeX

@misc{li2024scalableautoregressiveimagegeneration,
      title={Scalable Autoregressive Image Generation with Mamba}, 
      author={Haopeng Li and Jinyue Yang and Kexin Wang and Xuerui Qiu and Yuhong Chou and Xin Li and Guoqi Li},
      year={2024},
      eprint={2408.12245},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2408.12245}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
figure		figure
models		models
util		util
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
train_stage2.py		train_stage2.py
trainer.py		trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AiM: Scalable Autoregressive Image Generation with Mamba🐍

💡 What is AiM

🔔 Update

🚀 Getting Started

Train

Inference

🤗 Model Zoo

🌹 Acknowledgments

📖 BibTeX

About

Releases

Packages

Contributors 2

Languages

License

hp-l33/AiM

Folders and files

Latest commit

History

Repository files navigation

AiM: Scalable Autoregressive Image Generation with Mamba🐍

💡 What is AiM

🔔 Update

🚀 Getting Started

Train

Inference

🤗 Model Zoo

🌹 Acknowledgments

📖 BibTeX

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages