Add support to Meissonic #9875

viiika · 2024-11-06T11:42:30Z

Meissonic is a non-autoregressive mask image modeling text-to-image synthesis model that can generate high-resolution images. It is designed to run on consumer graphics cards.

The model checkpoint can be found in https://huggingface.co/MeissonFlow/Meissonic
The inference code can be found in https://github.com/viiika/Meissonic
The paper can be found in https://arxiv.org/abs/2410.08261

HuggingFaceDocBuilderDev · 2024-11-06T21:22:07Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

viiika · 2024-11-11T06:07:58Z

Any update or suggestion for this PR?

ParagEkbote · 2024-11-14T07:48:41Z

@viiika You need to tag the relevant maintainers of this repository. Also, creating a separate branch of your fork for submitting the PR is recommended.

Do you have any more advice? @sayakpaul

Any update or suggestion for this PR?

viiika · 2024-11-14T07:51:25Z

@viiika You need to tag the relevant maintainers of this repository. Also, creating a separate branch of your fork for submitting the PR is recommended.

Do you have any more advice? @sayakpaul

Any update or suggestion for this PR?

Thanks for your advice. I am just waiting for the approving review.

sayakpaul · 2024-11-14T10:11:35Z

I will let @yiyixuxu comment on deciding if this should be a core pipeline. Maybe you could supplement this PR with some examples and memory consumption numbers? @viiika

Given it's masked image generation (we only have one in diffusers ala aMUSEd) and works on consumer GPUs, I would prefer having it as a core pipeline.

viiika · 2024-11-14T11:17:15Z

I will let @yiyixuxu comment on deciding if this should be a core pipeline. Maybe you could supplement this PR with some examples and memory consumption numbers? @viiika

Given it's masked image generation (we only have one in diffusers ala aMUSEd) and works on consumer GPUs, I would prefer having it as a core pipeline.

Thank you for your comments! We'd like to highlight some key differences between Meissonic and aMUSEd:

Transformer Backbone: Meissonic adopts the modified MMDiT-based transformer backbone.
Positional Embedding: Meissonic adopts a new positional embedding, RoPE.
Sampling Conditions: Meissonic adopts masking rate as sampling conditions.
Resolution Support: Meissonic supports up to 1024 resolution, and the performance at this resolution is highly efficient. This enables Meissonic to remain competitive when compared with the latest state-of-the-art diffusion models.

These advancements collectively contribute to Meissonic's performance being comparable to SDXL.

Additionally, it seems there have been no further updates to aMUSEd, but we want to emphasize that Meissonic will continue to evolve. We plan to release Meissonic II in three months, with even more impressive performance.

The original VRAM consumption values are provided below. Additionally, we have developed FP8, INT8, and INT4 versions, achieving minimal VRAM usage with a requirement as low as 4GB to generate high-quality 1024 x 1024 resolution images.

viiika · 2024-11-14T11:35:21Z

Some samples are provided to showcase the performance of Meissonic. Additionally, users can refer to the appendix of the technical paper or explore the Hugging Face Spaces to experience the impressive capabilities of Meissonic firsthand.

viiika · 2024-11-14T11:40:35Z

We have also included the HPSv2.0 scores to demonstrate the performance.

Model	Animation	Concept-art	Painting	Photo	Averaged
Latent Diffusion Rombach et al.	25.73	25.15	25.25	26.97	25.78
VQGAN + CLIP Esser et al.	26.44	26.53	26.47	26.12	26.39
CogView2 Ding et al.	26.50	26.59	26.33	26.44	26.47
DALL·E 2 Ramesh et al.	27.34	26.54	26.68	27.24	26.95
Stable Diffusion v1.4 Rombach et al.	27.26	26.61	26.66	27.27	26.95
Stable Diffusion v2.0 Rombach et al.	27.48	26.89	26.86	27.46	27.17
DeepFloyd-XL DeepFloyd et al.	27.64	26.83	26.86	27.75	27.27
Deliberate Perfect Deliberate	28.13	27.46	27.45	27.62	27.67
SDXL Base 0.9 Podell et al.	28.42	27.63	27.60	27.29	27.73
Realistic Vision RealVisXL	28.22	27.53	27.56	27.75	27.77
SDXL Refiner 0.9 Podell et al.	28.45	27.66	27.67	27.46	27.80
Dreamlike Photoreal 2.0 Dreamlike Photoreal	28.24	27.60	27.59	27.99	27.86
SDXL Base 1.0 Podell et al.	28.88	27.88	27.92	28.31	28.25
SDXL Refiner 1.0 Podell et al.	28.93	27.89	27.90	28.38	28.27
Meissonic-512	28.90	28.15	28.22	28.04	28.33
Meissonic	29.57	28.58	28.72	28.45	28.83

viiika added 2 commits November 6, 2024 19:36

add support to Meissonic

322d90a

add support to Meissonic

aad05e5

viiika mentioned this pull request Nov 6, 2024

Add support to Meissonic #9794

Open

viiika added 4 commits November 7, 2024 10:41

fix issues mentioned in check_code_quality

c6fd324

fix file conflicts

c9faab8

Merge branch 'main' into main

5f5c34e

Merge branch 'main' into main

34efe48

Merge branch 'main' into main

7644e97

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support to Meissonic #9875

Add support to Meissonic #9875

viiika commented Nov 6, 2024

HuggingFaceDocBuilderDev commented Nov 6, 2024

viiika commented Nov 11, 2024

ParagEkbote commented Nov 14, 2024

viiika commented Nov 14, 2024

sayakpaul commented Nov 14, 2024

viiika commented Nov 14, 2024 •

edited

Loading

viiika commented Nov 14, 2024

viiika commented Nov 14, 2024

Add support to Meissonic #9875

Are you sure you want to change the base?

Add support to Meissonic #9875

Conversation

viiika commented Nov 6, 2024

HuggingFaceDocBuilderDev commented Nov 6, 2024

viiika commented Nov 11, 2024

ParagEkbote commented Nov 14, 2024

viiika commented Nov 14, 2024

sayakpaul commented Nov 14, 2024

viiika commented Nov 14, 2024 • edited Loading

viiika commented Nov 14, 2024

viiika commented Nov 14, 2024

viiika commented Nov 14, 2024 •

edited

Loading