Skip to content

🔥ImageFolder: Autoregressive Image Generation with Folded Tokens

Notifications You must be signed in to change notification settings

lxa9867/ImageFolder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 

Repository files navigation

ImageFolder🚀: Autoregressive Image Generation with Folded Tokens

project page  arXiv  huggingface weights 

Updates

  • (2024.11.14) Code will be released in two weeks (company approval in progress).
  • (2024.10.03) We are working on advanced training of ImageFolder tokenizer.
  • (2024.10.01) Repo created. Code and checkpoints will be released soon.

Ablation (updating)

ID Method Length rFID ↓ gFID ↓ ACC ↑
🔶1 Multi-scale residual quantization (Tian et al., 2024) 680 1.92 7.52 -
🔶2 + Quantizer dropout 680 1.71 6.03 -
🔶3 + Smaller patch size K = 11 265 3.24 6.56 -
🔶4 + Product quantization & Parallel decoding 265 2.06 5.96 -
🔶5 + Semantic regularization on all branches 265 1.97 5.21 -
🔶6 + Semantic regularization on one branch 265 1.57 3.53 40.5
🔷7 + Stronger discriminator 265 1.04 2.94 50.2
🔷8 + Equilibrium enhancement 265 0.80 2.60 58.0

🔶1-6 are already in the released paper, and after that 🔷7+ are advanced training settings used similar to VAR (gFID 3.30).

Generation

Visualization of Decomposed Token

Acknowledge

We would like to thank the following repositories: LlamaGen, VAR and ControlVAR.

Citation

If our work assists your research, feel free to give us a star ⭐ or cite us using

@misc{li2024imagefolderautoregressiveimagegeneration,
      title={ImageFolder: Autoregressive Image Generation with Folded Tokens}, 
      author={Xiang Li and Hao Chen and Kai Qiu and Jason Kuen and Jiuxiang Gu and Bhiksha Raj and Zhe Lin},
      year={2024},
      eprint={2410.01756},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2410.01756}, 
}