Kandinsky 5 is finally in Diffusers! #12478

leffff · 2025-10-13T22:43:57Z

What does this PR do?

This PR adds Kandinsky5T2VPipeline and Kandinsky5Transformer3DModel as well as several layer classes neede for Kandinsky 5.0 Lite T2V model

@sayakpaul Please review

sayakpaul · 2025-10-14T04:02:35Z

Could you please update the PR with test code and some example outputs?

leffff · 2025-10-14T08:40:46Z

Sure!

leffff · 2025-10-14T12:20:06Z

@sayakpaul

The example is here:
https://github.com/leffff/diffusers/blob/04efb19b1aeba3b41b7b1bd6d0353a1715c0f839/src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py#L51

Just like in Wan:

diffusers/src/diffusers/pipelines/wan/pipeline_wan.py

Line 46 in fa468c5

EXAMPLE_DOC_STRING = """

leffff · 2025-10-14T12:28:33Z

Dear @sayakpaul @yiyixuxu @DN6
How should the test code and example outputs look like?

leffff · 2025-10-14T13:53:38Z

import torch
from diffusers import Kandinsky5T2VPipeline
from diffusers.utils import export_to_video

pipe = Kandinsky5T2VPipeline.from_pretrained(
    "ai-forever/Kandinsky-5.0-T2V-Lite-sft-5s-Diffusers", 
    torch_dtype=torch.bfloat16
)
pipe = pipe.to("cuda")

negative_prompt = [
    "Static, 2D cartoon, cartoon, 2d animation, paintings, images, worst quality, low quality, ugly, deformed, walking backwards",
]
prompt = [
    "A cat and a dog baking a cake together in a kitchen.",
]

output = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    height=512,
    width=768,
    num_frames=121,
    num_inference_steps=50,
    guidance_scale=5.0,
    num_videos_per_prompt=1,
    generator=torch.Generator(42)
)

output.10.mp4

prompt = [
    "A monkey ridign a skateboard",
]

output.10.mp4

prompt = [
    "Several giant wooly mammoths threading through the meadow",
]

output.10.mp4

sayakpaul · 2025-10-14T15:25:17Z

Great, thanks for providing the examples! Does the model also do realistic generations? 👀

@linoytsaban @apolinario @asomoza in case you wanna test it?

leffff · 2025-10-14T15:56:13Z

Yes of course!

A stylish woman struts confidently down a rain-drenched Tokyo street, where vibrant neon signs flicker and pulse with electric color. She wears a sleek black leather jacket over a flowing red dress, paired with polished black boots and a matching black purse. Her sunglasses reflect the glowing cityscape as she moves with a calm, assured demeanor, red lipstick adding a bold contrast to her look. The wet pavement mirrors the dazzling lights, doubling the intensity of the urban glow around her. Pedestrians bustle along the sidewalks, their silhouettes blending into the dynamic, cinematic atmosphere of the neon-lit metropolis.

output.10.mp4

A cinematic movie trailer unfolds with a 30-year-old space man traversing a vast salt desert beneath a brilliant blue sky. He wears a uniquely styled red wool knitted motorcycle helmet, adding an eccentric yet rugged charm to his spacefaring look. As he rides a retro-futuristic vehicle across the shimmering white terrain, the wind kicks up clouds of glittering salt, creating a surreal atmosphere. The scene is captured in a vivid, cinematic style, shot on 35mm film to enhance the nostalgic and dramatic grain. Explosions of color and dynamic camera movements highlight the space man's daring escape from a collapsing alien base in the distance.

output.11.mp4

asomoza

thanks, looks cool! left some suggestions for unused imports

src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py

Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>

Co-authored-by: YiYi Xu <yixu310@gmail.com>

cbensimon · 2025-10-16T09:32:45Z

src/diffusers/models/transformers/transformer_kandinsky.py

+    """
+    A 3D Diffusion Transformer model for video-like data.
+    """
+


Suggested change

_repeated_blocks = [

"Kandinsky5TransformerEncoderBlock",

"Kandinsky5TransformerDecoderBlock",

]

Should we declare repeated blocks @sayakpaul?

yes let's add that

leffff · 2025-10-16T10:37:29Z

@yiyixuxu
I've made lot's of corrections. Please review them. I have followed through the whole feedback, tackling every issue!

src/diffusers/models/transformers/transformer_kandinsky.py

yiyixuxu · 2025-10-16T21:43:34Z

src/diffusers/models/transformers/transformer_kandinsky.py

+        key = apply_rotary(key, rope).type_as(key)
+
+        if sparse_params is not None:
+            out = self.nabla(query, key, value, sparse_params=sparse_params)


can we look into if we can refactor the attention to use dispatch_attention_fn instead, so that we can use different attention implemnentation out of box, flex or others

see this PR #11916
reference code: https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/transformers/transformer_flux.py#L118
doc: https://huggingface.co/docs/diffusers/main/en/optimization/attention_backends

We want to move on with the integration as soon as possible.
Can we contribute the code as is, but add support for 10 sec models later?

We indeed proposed a new attention algorithm.
So does that require implementing it as the new attention backend?

ok, so I think by default we still use SDPA, no? the flex stuff is only optional and user has to config the attention_type to be flex in order to use it - if that's the case, I think the fatest way to get this PR in is to remove all the flex related stuff for now, and add that in a follow-up PR using dispatch_attention_fn

Ok, I see!
You want to have a KandinskyAttnProcessor with several backends. Okay

yes, you should be able to structure KandinskyAttnProcessor to use dispatch_attention_fn so that it works with different backends out of box - instead of having to manually handle it like current code does
but if it will take to much time - we can just support the default one and make it work with dispatch_attention_fn in a follow-up PR

Co-authored-by: YiYi Xu <yixu310@gmail.com>

yiyixuxu · 2025-10-17T05:30:50Z

@leffff
i refactored kandinsky 5 attention here in this commit acabbc0
you can cherrypick that commit or just do something similar to what I did there

I tested the default SDPA backend, did not test flex but the code/logic should roughly look like that

leffff and others added 15 commits October 4, 2025 10:10

add transformer pipeline first version

d53f848

updates

7db6093

fix 5sec generation

a0cf07f

Merge branch 'huggingface:main' into main

0bd738f

rewrite Kandinsky5T2VPipeline to diffusers style

c8f3a36

Merge branch 'huggingface:main' into main

86b6c2b

add multiprompt support

723d149

remove prints in pipeline

22e14bd

add nabla attention

70fa62b

Merge branch 'huggingface:main' into main

07e11b2

Wrap Transformer in Diffusers style

45240a7

fix license

43bd1e8

Merge branch 'huggingface:main' into main

f35c279

fix prompt type

149fd53

Merge branch 'main' of https://github.com/leffff/diffusers

e3a3e9d

sayakpaul requested review from DN6 and yiyixuxu October 14, 2025 04:02

add gradient checkpointing and peft support

7af80e9

MeiYi-dev mentioned this pull request Oct 14, 2025

[Feature]: Kandinsky 5.0 videogen model support. vladmandic/sdnext#4264

Open

add usage example

04efb19

Merge branch 'main' into main

4aa22f3

asomoza reviewed Oct 14, 2025

View reviewed changes

Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py

235f0d5

Co-authored-by: Álvaro Somoza <asomoza@users.noreply.github.com>

leffff and others added 14 commits October 16, 2025 06:50

remove no_grad and simplified prompt paddings

327ab84

Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py

9b06afb

Co-authored-by: YiYi Xu <yixu310@gmail.com>

merge

8fd22c0

Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py

28458d0

Co-authored-by: YiYi Xu <yixu310@gmail.com>

merge suggestions

e7b91ed

moved template to __init__

cd3cc61

Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py

4450265

Co-authored-by: YiYi Xu <yixu310@gmail.com>

Update src/diffusers/pipelines/kandinsky5/pipeline_kandinsky.py

b9a3be2

Co-authored-by: YiYi Xu <yixu310@gmail.com>

Update src/diffusers/models/transformers/transformer_kandinsky.py

78a23b9

Co-authored-by: YiYi Xu <yixu310@gmail.com>

moved sdps inside processor

56b90b1

Merge branch 'main' of https://github.com/leffff/diffusers

600e9d6

remove oneline function

31a1474

remove reset_dtype methods

894aa98

Transformer: move all methods to forward

c8be081

cbensimon reviewed Oct 16, 2025

View reviewed changes

separated prompt encoding

3ffdf7f

Merge branch 'main' into main

b0e1b86