Skip to content

Official code for VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control

Notifications You must be signed in to change notification settings

fenfenfenfan/VMix

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 

Repository files navigation

VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control

Build Build

Shaojin Wu,1 Fei Ding,1,* Mengqi Huang,1,2 Wei Liu,1 Qian He1
1 ByteDance Inc.   2 University of Science and Technology of China

📖 Introduction

We propose VMix, a plug-and-play aesthetics adapter, to upgrade the quality of generated images while maintaining generality across visual concepts by (1) disentangling the input text prompt into the content description and aesthetic description by the initialization of aesthetic embedding, and (2) integrating aesthetic conditions into the denoising process through value-mixed cross-attention, with the network connected by zero-initialized linear layers. VMix outperforms other state-of-the-art methods and is flexible enough to be applied to community modules (e.g., LoRA, ControlNet, and IPAdapter) for better visual performance without retraining.

🎨 Examples

Qualitative comparison between results with VMix(on the right) and without VMix(on the left)

Aesthetic Fine-grained Control
For more visual results, go checkout our Project Page

🔥Updates

We will open source this project as soon as possible. Thank you for your patience and support! 🌟

  • Release arXiv paper. Check the details here.
  • Release inference code(Coming soon).
  • Release model checkpoints.
  • Release ComfyUI node.

Citation

If VMix is helpful, please help to ⭐ the repo.

If you find this project useful for your research, please consider citing our paper:

@misc{wu2024vmix,
    title={VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control},
    author={Shaojin Wu and Fei Ding and Mengqi Huang and Wei Liu and Qian He},
    year={2024},
    eprint={2412.20800},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

About

Official code for VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published