VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control

Shaojin Wu,¹ Fei Ding,^1,* Mengqi Huang,^1,2 Wei Liu,¹ Qian He¹
¹ ByteDance Inc. ² University of Science and Technology of China

📖 Introduction

We propose VMix, a plug-and-play aesthetics adapter, to upgrade the quality of generated images while maintaining generality across visual concepts by (1) disentangling the input text prompt into the content description and aesthetic description by the initialization of aesthetic embedding, and (2) integrating aesthetic conditions into the denoising process through value-mixed cross-attention, with the network connected by zero-initialized linear layers. VMix outperforms other state-of-the-art methods and is flexible enough to be applied to community modules (e.g., LoRA, ControlNet, and IPAdapter) for better visual performance without retraining.

🎨 Examples

Qualitative comparison between results with VMix(on the right) and without VMix(on the left)

Aesthetic Fine-grained Control

For more visual results, go checkout our Project Page

🔥Updates

We will open source this project as soon as possible. Thank you for your patience and support! 🌟

Release arXiv paper. Check the details here.
Release inference code(Coming soon).
Release model checkpoints.
Release ComfyUI node.

Citation

If VMix is helpful, please help to ⭐ the repo.

If you find this project useful for your research, please consider citing our paper:

@misc{wu2024vmix,
    title={VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control},
    author={Shaojin Wu and Fei Ding and Mengqi Huang and Wei Liu and Qian He},
    year={2024},
    eprint={2412.20800},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
asset		asset
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control

📖 Introduction

🎨 Examples

🔥Updates

Citation

About

Releases

Packages

fenfenfenfan/VMix

Folders and files

Latest commit

History

Repository files navigation

VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control

📖 Introduction

🎨 Examples

🔥Updates

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages