Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to Meissonic #9875

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Add support to Meissonic #9875

wants to merge 7 commits into from

Conversation

viiika
Copy link

@viiika viiika commented Nov 6, 2024

Meissonic is a non-autoregressive mask image modeling text-to-image synthesis model that can generate high-resolution images. It is designed to run on consumer graphics cards.

The model checkpoint can be found in https://huggingface.co/MeissonFlow/Meissonic
The inference code can be found in https://github.com/viiika/Meissonic
The paper can be found in https://arxiv.org/abs/2410.08261

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@viiika
Copy link
Author

viiika commented Nov 11, 2024

Any update or suggestion for this PR?

@ParagEkbote
Copy link
Contributor

@viiika You need to tag the relevant maintainers of this repository. Also, creating a separate branch of your fork for submitting the PR is recommended.

Do you have any more advice? @sayakpaul

Any update or suggestion for this PR?

@viiika
Copy link
Author

viiika commented Nov 14, 2024

@viiika You need to tag the relevant maintainers of this repository. Also, creating a separate branch of your fork for submitting the PR is recommended.

Do you have any more advice? @sayakpaul

Any update or suggestion for this PR?

Thanks for your advice. I am just waiting for the approving review.

@sayakpaul
Copy link
Member

I will let @yiyixuxu comment on deciding if this should be a core pipeline. Maybe you could supplement this PR with some examples and memory consumption numbers? @viiika

Given it's masked image generation (we only have one in diffusers ala aMUSEd) and works on consumer GPUs, I would prefer having it as a core pipeline.

@viiika
Copy link
Author

viiika commented Nov 14, 2024

I will let @yiyixuxu comment on deciding if this should be a core pipeline. Maybe you could supplement this PR with some examples and memory consumption numbers? @viiika

Given it's masked image generation (we only have one in diffusers ala aMUSEd) and works on consumer GPUs, I would prefer having it as a core pipeline.

Thank you for your comments! We'd like to highlight some key differences between Meissonic and aMUSEd:

  1. Transformer Backbone: Meissonic adopts the modified MMDiT-based transformer backbone.
  2. Positional Embedding: Meissonic adopts a new positional embedding, RoPE.
  3. Sampling Conditions: Meissonic adopts masking rate as sampling conditions.
  4. Resolution Support: Meissonic supports up to 1024 resolution, and the performance at this resolution is highly efficient. This enables Meissonic to remain competitive when compared with the latest state-of-the-art diffusion models.

These advancements collectively contribute to Meissonic's performance being comparable to SDXL.

Additionally, it seems there have been no further updates to aMUSEd, but we want to emphasize that Meissonic will continue to evolve. We plan to release Meissonic II in three months, with even more impressive performance.

The original VRAM consumption values are provided below. Additionally, we have developed FP8, INT8, and INT4 versions, achieving minimal VRAM usage with a requirement as low as 4GB to generate high-quality 1024 x 1024 resolution images.
GPU_Memory_Comparison (pdf io)

@viiika
Copy link
Author

viiika commented Nov 14, 2024

Some samples are provided to showcase the performance of Meissonic. Additionally, users can refer to the appendix of the technical paper or explore the Hugging Face Spaces to experience the impressive capabilities of Meissonic firsthand.

appendix_figure_11 (pdf io)
appendix_figure_hps_5

@viiika
Copy link
Author

viiika commented Nov 14, 2024

We have also included the HPSv2.0 scores to demonstrate the performance.

Model Animation Concept-art Painting Photo Averaged
Latent Diffusion Rombach et al. 25.73 25.15 25.25 26.97 25.78
VQGAN + CLIP Esser et al. 26.44 26.53 26.47 26.12 26.39
CogView2 Ding et al. 26.50 26.59 26.33 26.44 26.47
DALL·E 2 Ramesh et al. 27.34 26.54 26.68 27.24 26.95
Stable Diffusion v1.4 Rombach et al. 27.26 26.61 26.66 27.27 26.95
Stable Diffusion v2.0 Rombach et al. 27.48 26.89 26.86 27.46 27.17
DeepFloyd-XL DeepFloyd et al. 27.64 26.83 26.86 27.75 27.27
Deliberate Perfect Deliberate 28.13 27.46 27.45 27.62 27.67
SDXL Base 0.9 Podell et al. 28.42 27.63 27.60 27.29 27.73
Realistic Vision RealVisXL 28.22 27.53 27.56 27.75 27.77
SDXL Refiner 0.9 Podell et al. 28.45 27.66 27.67 27.46 27.80
Dreamlike Photoreal 2.0 Dreamlike Photoreal 28.24 27.60 27.59 27.99 27.86
SDXL Base 1.0 Podell et al. 28.88 27.88 27.92 28.31 28.25
SDXL Refiner 1.0 Podell et al. 28.93 27.89 27.90 28.38 28.27
Meissonic-512 28.90 28.15 28.22 28.04 28.33
Meissonic 29.57 28.58 28.72 28.45 28.83

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants