Skip to content

Add SkyReels V2: Infinite-Length Film Generative Model #11518

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 283 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
283 commits
Select commit Hold shift + click to select a range
ded93bc
Enhance `SkyReelsV2DiffusionForcingImageToVideoPipeline` by refining …
tolgacangoz May 13, 2025
c9483b2
Remove unused dtype handling in `SkyReelsV2DiffusionForcingPipeline` …
tolgacangoz May 13, 2025
f7fed01
up
tolgacangoz May 13, 2025
0e7b21d
up
tolgacangoz May 13, 2025
b3698d7
Update references
tolgacangoz May 13, 2025
7e0f0f5
Add `generate_timestep_matrix` method to `SkyReelsV2DiffusionForcingP…
tolgacangoz May 14, 2025
47080c2
Merge branch 'main' into skyreels-v2
tolgacangoz May 14, 2025
8c23208
Remove training-related code
tolgacangoz May 14, 2025
1f8e268
Add gradient checkpointing support in `SkyReelsV2Transformer3DModel` …
tolgacangoz May 14, 2025
d853521
Refactor `SkyReelsV2TransformerBlock` and remove unused `Head` class.…
tolgacangoz May 14, 2025
2b79584
Remove unused parameter `y` and associated documentation from `SkyRee…
tolgacangoz May 14, 2025
600ced3
Update context length calculation in `SkyReelsV2AttnProcessor2_0` to …
tolgacangoz May 14, 2025
586fe56
Fix comparison logic in `SkyReelsV2AttnProcessor2_0` to correctly det…
tolgacangoz May 14, 2025
afcaf6e
Remove unused `flex_attention` variable from `transformer_skyreels_v2…
tolgacangoz May 14, 2025
465df8c
Updates SkyReelsV2 pipeline defaults and docs
tolgacangoz May 15, 2025
cad2d38
Remove `enable_teacache` functionality from `SkyReelsV2Transformer3DM…
tolgacangoz May 15, 2025
1fcdf98
Refactor `SkyReelsV2Transformer3DModel` to use configuration paramete…
tolgacangoz May 15, 2025
6d57725
Remove unused import of `numpy` and clean up whitespace in `transform…
tolgacangoz May 15, 2025
c4cec04
Refactor `SkyReelsV2DiffusionForcingPipeline` to improve error handli…
tolgacangoz May 15, 2025
6a85ba1
Refactor `SkyReelsV2DiffusionForcingPipeline` to enhance sample sched…
tolgacangoz May 16, 2025
76af29b
update template for df_i2v
tolgacangoz May 16, 2025
81206ce
style
tolgacangoz May 16, 2025
906b6f5
Refactor `SkyReelsV2DiffusionForcingPipeline` to improve the handling…
tolgacangoz May 16, 2025
e2391b6
Add newly released `SkyReelsV2DiffusionForcingVideoToVideoPipeline` t…
tolgacangoz May 16, 2025
245534f
up df_i2v
tolgacangoz May 16, 2025
aaa8a8b
Refactor `SkyReelsV2DiffusionForcingPipeline` to improve the handling…
tolgacangoz May 16, 2025
ca3f7bd
Integrate video decoding in pipeline
tolgacangoz May 18, 2025
b4e26fd
up
tolgacangoz May 18, 2025
c3bcd1d
Fix variable name typo in `SkyReelsV2DiffusionForcingPipeline` from `…
tolgacangoz May 18, 2025
c9bea14
Fix variable name from `casual_block_size` to `causal_block_size` for…
tolgacangoz May 18, 2025
00fdeb0
Update `_no_split_modules` in `SkyReelsV2Transformer3DModel` and adju…
tolgacangoz May 18, 2025
cf91fb4
Refactor type hint for `device` parameter in `_prepare_blockwise_caus…
tolgacangoz May 18, 2025
256fa6d
Refactor `SkyReelsV2DiffusionForcingPipeline` to streamline the setti…
tolgacangoz May 19, 2025
a74252c
Add `flag_df` parameter to `SkyReelsV2Transformer3DModel` for improve…
tolgacangoz May 19, 2025
771fb05
Refactor `SkyReelsV2DiffusionForcingPipeline` to enhance clarity and …
tolgacangoz May 19, 2025
8e61893
Merge branch 'main' into skyreels-v2
tolgacangoz May 19, 2025
bccad55
Add script for converting SkyReelsV2 models to Diffusers format
tolgacangoz May 20, 2025
59c1e88
down
tolgacangoz May 20, 2025
02f038d
Update documentation in `SkyReelsV2DiffusionForcingPipeline` to clari…
tolgacangoz May 20, 2025
32ca01a
up
tolgacangoz May 20, 2025
02ffe0c
Refactor model directory path handling in `convert_transformer` funct…
tolgacangoz May 20, 2025
a215677
fix "inject_sample_info": true,
tolgacangoz May 20, 2025
1e4c501
temp fix
tolgacangoz May 20, 2025
322ce0c
up
tolgacangoz May 20, 2025
b7d54d6
fix `qk_norm`
tolgacangoz May 20, 2025
be77ad8
Refactor `convert_skyreelsv2_to_diffusers.py` to use `SkyreelsV2Image…
tolgacangoz May 20, 2025
6f8ffb2
for vae
tolgacangoz May 20, 2025
4576f6e
for t5
tolgacangoz May 20, 2025
10174ca
up
tolgacangoz May 20, 2025
9223f2d
temp fix
tolgacangoz May 20, 2025
a1aadd3
up
tolgacangoz May 20, 2025
f369cc4
Remove assertion for 1D timesteps in `get_timestep_embedding` functio…
tolgacangoz May 21, 2025
eb32376
Refactor timestep handling in `SkyReelsV2DiffusionForcingPipeline` to…
tolgacangoz May 21, 2025
671b37e
Enhance `get_timestep_embedding` to support 2D tensor inputs, allowin…
tolgacangoz May 21, 2025
6f8bf30
Fix unflattening of timestep projection in `SkyReelsV2Transformer3DMo…
tolgacangoz May 21, 2025
c71d3aa
Update dtype handling in `SkyReelsV2Transformer3DModel` to ensure con…
tolgacangoz May 21, 2025
1afa337
Refactor tensor reshaping in `SkyReelsV2Transformer3DModel` to utiliz…
tolgacangoz May 21, 2025
c74675c
Refactor timestep preparation in `SkyReelsV2DiffusionForcingPipeline`…
tolgacangoz May 21, 2025
602cff7
fix: multi-dimentional indexing
tolgacangoz May 21, 2025
237e468
Comment out tensor unsqueezing in `SkyReelsV2DiffusionForcingPipeline…
tolgacangoz May 21, 2025
40c456d
Update dtype handling in `SkyReelsV2DiffusionForcingPipeline` to use …
tolgacangoz May 21, 2025
9ed88da
fix dype
tolgacangoz May 21, 2025
6a3c7bf
fix
tolgacangoz May 21, 2025
5652aa0
Refactor sample scheduler initialization in `SkyReelsV2DiffusionForci…
tolgacangoz May 21, 2025
e529fea
Adds shift parameter to scheduler timestep setting
tolgacangoz May 21, 2025
b3ffeca
Fix slicing of latents in `SkyReelsV2DiffusionForcingPipeline` to ens…
tolgacangoz May 21, 2025
4479afc
Fix tensor slicing in `SkyReelsV2DiffusionForcingPipeline` to ensure …
tolgacangoz May 21, 2025
e4f6743
Update progress bar total in `SkyReelsV2DiffusionForcingPipeline` to …
tolgacangoz May 23, 2025
7420446
Refactor error handling and tensor processing in `SkyReelsV2Diffusion…
tolgacangoz May 23, 2025
2d59ebd
Refactor tensor processing and noise application in `SkyReelsV2Diffus…
tolgacangoz May 23, 2025
8af4a9f
Refactor variable naming and tensor handling in `SkyReelsV2DiffusionF…
tolgacangoz May 23, 2025
57a2bf9
style
tolgacangoz May 23, 2025
ae6adbe
fix number of frames for long video generation
tolgacangoz May 24, 2025
9afb214
up
tolgacangoz May 24, 2025
f1483ad
fix: `latents` initialization for long video generation in processing…
tolgacangoz May 24, 2025
a16c31b
update templates
tolgacangoz May 24, 2025
3b7b63b
Enhance `convert_skyreelsv2_to_diffusers.py` by adding support for lo…
tolgacangoz May 24, 2025
5e1126d
Update model configuration in `convert_skyreelsv2_to_diffusers.py` to…
tolgacangoz May 24, 2025
820d415
Refactor `set_ar_attention` method in `SkyReelsV2Transformer3DModel` …
tolgacangoz May 24, 2025
528e0d7
up
tolgacangoz May 25, 2025
6c4301c
up
tolgacangoz May 25, 2025
7d5328f
upp
tolgacangoz May 25, 2025
00849fd
fix file name
tolgacangoz May 25, 2025
8e34d89
Update `SkyReelsV2Transformer3DModel` to conditionally apply `causal_…
tolgacangoz May 25, 2025
493a08c
Merge branch 'main' into skyreels-v2
tolgacangoz May 25, 2025
a6f0d11
style
tolgacangoz May 25, 2025
cc0660c
Fix class name casing for SkyReelsV2 components in multiple files to …
tolgacangoz May 25, 2025
14d8d7a
cleaning
tolgacangoz May 25, 2025
85a1f90
cleansing
tolgacangoz May 25, 2025
5264ac9
Refactor `get_timestep_embedding` to move modifications into `SkyReel…
tolgacangoz May 26, 2025
81acfae
Remove unnecessary line break in `get_timestep_embedding` function fo…
tolgacangoz May 26, 2025
11baa00
Remove `skyreels_v2` entry from `_import_structure` and update its in…
tolgacangoz May 26, 2025
2906c37
cleansing
tolgacangoz May 26, 2025
a38eaab
Refactor attention processing in `SkyReelsV2AttnProcessor2_0` to alwa…
tolgacangoz May 26, 2025
150ea56
Enhance example usage in `pipeline_skyreels_v2_diffusion_forcing.py` …
tolgacangoz May 26, 2025
ad7d4c4
Refactor import structure in `__init__.py` for SkyReelsV2 components …
tolgacangoz May 26, 2025
ed7843a
Merge branch 'main' into skyreels-v2
tolgacangoz May 26, 2025
f1ee024
Update `guidance_scale` parameter in `SkyReelsV2DiffusionForcingPipel…
tolgacangoz May 26, 2025
421e0dc
Update `guidance_scale` parameter in example documentation and class …
tolgacangoz May 26, 2025
4b688c4
Update `causal_block_size` parameter in `SkyReelsV2DiffusionForcingPi…
tolgacangoz May 26, 2025
c6b5391
up
tolgacangoz May 26, 2025
3bf1e4a
Fix dtype conversion for `timestep_proj` in `SkyReelsV2Transformer3DM…
tolgacangoz May 26, 2025
f48363c
Optimize causal mask generation by replacing repeated tensor with `re…
tolgacangoz May 26, 2025
920d956
style
tolgacangoz May 26, 2025
cedee34
Merge branch 'main' into skyreels-v2
tolgacangoz May 26, 2025
db9cda9
Enhance example documentation in `SkyReelsV2DiffusionForcingPipeline`…
tolgacangoz May 27, 2025
ff6eeea
Refactor sample scheduler creation in `SkyReelsV2DiffusionForcingPipe…
tolgacangoz May 27, 2025
82db3ab
Merge branch 'main' into skyreels-v2
tolgacangoz May 27, 2025
c0abccc
Enhance error handling and documentation in `SkyReelsV2DiffusionForci…
tolgacangoz May 27, 2025
35061d0
Update documentation and progress bar handling in `SkyReelsV2Diffusio…
tolgacangoz May 27, 2025
cede08c
Refine progress bar calculation in `SkyReelsV2DiffusionForcingPipelin…
tolgacangoz May 27, 2025
5bc9a1b
Update import statements in `SkyReelsV2DiffusionForcingPipeline` docu…
tolgacangoz May 27, 2025
0cdfb99
Merge branch 'main' into skyreels-v2
tolgacangoz May 28, 2025
5c658c9
Refactor progress bar handling in `SkyReelsV2DiffusionForcingPipeline…
tolgacangoz May 28, 2025
b30a426
update templates for i2v, v2v
tolgacangoz May 28, 2025
238d07d
Add `retrieve_latents` function to streamline latent retrieval in `Sk…
tolgacangoz May 28, 2025
d3bd638
Add `retrieve_latents` function to both i2v and v2v pipelines for con…
tolgacangoz May 28, 2025
2aab1de
Remove redundant ValueError for `overlap_history` in `SkyReelsV2Diffu…
tolgacangoz May 28, 2025
8ab5bb1
Update default video dimensions and flow matching scheduler parameter…
tolgacangoz May 28, 2025
323ec66
Refactor `SkyReelsV2DiffusionForcingPipeline` to support Image-to-Vid…
tolgacangoz May 28, 2025
ce804ad
Improve organization for image-last_image condition.
tolgacangoz May 28, 2025
ff97206
Refactor `SkyReelsV2DiffusionForcingImageToVideoPipeline` to improve …
tolgacangoz May 28, 2025
5d702cf
style
tolgacangoz May 28, 2025
b6536ed
Merge branch 'main' into skyreels-v2
tolgacangoz May 28, 2025
9d35809
style
tolgacangoz May 28, 2025
0f915f6
Add example usage of PIL for image input in `SkyReelsV2DiffusionForci…
tolgacangoz May 28, 2025
9a6746b
Refactor `SkyReelsV2DiffusionForcingPipeline` to `SkyReelsV2Diffusion…
tolgacangoz May 28, 2025
b879963
Refactor `SkyReelsV2DiffusionForcingImageToVideoPipeline` by removing…
tolgacangoz May 29, 2025
7f35894
Refactor `SkyReelsV2DiffusionForcingImageToVideoPipeline` to enhance …
tolgacangoz May 29, 2025
a97d4d8
Enhance `SkyReelsV2DiffusionForcingPipeline` by refining latent prepa…
tolgacangoz May 29, 2025
e2bfbfa
refactor
tolgacangoz May 29, 2025
594082e
fix num_frames
tolgacangoz May 29, 2025
c4c9c0a
fix prefix_video_latents
tolgacangoz May 29, 2025
79960de
up
tolgacangoz May 29, 2025
f6cd857
refactor
tolgacangoz May 29, 2025
3ce9b05
Fix typo in scheduler method call within `SkyReelsV2DiffusionForcingV…
tolgacangoz May 29, 2025
f1b8508
up
tolgacangoz May 29, 2025
aad0feb
Enhance `SkyReelsV2DiffusionForcingImageToVideoPipeline` by adding su…
tolgacangoz May 29, 2025
0958647
add statistics
tolgacangoz May 30, 2025
fcfc7f4
Refine latent frame handling in `SkyReelsV2DiffusionForcingImageToVid…
tolgacangoz May 30, 2025
b197ffb
up
tolgacangoz May 30, 2025
54f1aa5
refactor
tolgacangoz May 30, 2025
0a45793
up
tolgacangoz May 30, 2025
37649b2
Refactor `SkyReelsV2DiffusionForcingVideoToVideoPipeline` to improve …
tolgacangoz May 30, 2025
46c6e72
style
tolgacangoz May 30, 2025
0edb263
tolgacangoz May 31, 2025
4d724df
fix vae output indexing
tolgacangoz May 31, 2025
79dbd0e
upup
tolgacangoz May 31, 2025
22c761e
tolgacangoz May 31, 2025
fbf5cc1
tolgacangoz May 31, 2025
92c4e8c
up
tolgacangoz May 31, 2025
bb9ca6f
Fix tensor concatenation and repetition logic in `SkyReelsV2Diffusion…
tolgacangoz May 31, 2025
18e525f
Refactor latent retrieval logic in `SkyReelsV2DiffusionForcingVideoTo…
tolgacangoz May 31, 2025
528a811
Enhance logging in `SkyReelsV2DiffusionForcing` pipelines by adding i…
tolgacangoz Jun 1, 2025
7814f7d
Update latent handling in `SkyReelsV2DiffusionForcingImageToVideoPipe…
tolgacangoz Jun 1, 2025
6d1d1e9
Refactor `SkyReelsV2TimeTextImageEmbedding` to utilize `get_1d_sincos…
tolgacangoz Jun 2, 2025
82d86e4
Enhance `get_1d_sincos_pos_embed_from_grid` function to include an op…
tolgacangoz Jun 2, 2025
2dce751
Update timestep projection in `SkyReelsV2TimeTextImageEmbedding` to i…
tolgacangoz Jun 2, 2025
a4aa0ba
Refactor tensor type handling in `SkyReelsV2AttnProcessor2_0` and `Sk…
tolgacangoz Jun 2, 2025
63814a0
Update tensor type in `SkyReelsV2RotaryPosEmbed` to use `torch.float3…
tolgacangoz Jun 2, 2025
a74248f
Refactor `SkyReelsV2TimeTextImageEmbedding` to utilize automatic mixe…
tolgacangoz Jun 2, 2025
efccb9e
down
tolgacangoz Jun 3, 2025
b836618
down
tolgacangoz Jun 3, 2025
11cd6fb
style
tolgacangoz Jun 3, 2025
786f145
Add debug tensor tracking to `SkyReelsV2Transformer3DModel` for enhan…
tolgacangoz Jun 3, 2025
b597f9e
up
tolgacangoz Jun 3, 2025
6caffc9
Refactor indentation in `SkyReelsV2AttnProcessor2_0` to improve code …
tolgacangoz Jun 3, 2025
848acfc
Convert query, key, and value tensors to bfloat16 in `SkyReelsV2AttnP…
tolgacangoz Jun 3, 2025
a8e01ba
Add debug print statements in `SkyReelsV2TransformerBlock` to track t…
tolgacangoz Jun 4, 2025
aef60a1
debug
tolgacangoz Jun 4, 2025
f70627a
tolgacangoz Jun 4, 2025
7a98f19
debug
tolgacangoz Jun 4, 2025
17e931a
Remove commented-out debug tensor tracking from `SkyReelsV2Transforme…
tolgacangoz Jun 4, 2025
19dae16
Add functionality to save processed video latents as a Safetensors fi…
tolgacangoz Jun 5, 2025
2947e52
up
tolgacangoz Jun 5, 2025
324e7fe
Add functionality to save output latents as a Safetensors file in `Sk…
tolgacangoz Jun 5, 2025
abf59a5
up
tolgacangoz Jun 5, 2025
e227b38
Remove additional commented-out debug tensor tracking from `SkyReelsV…
tolgacangoz Jun 5, 2025
c6ef3cf
style
tolgacangoz Jun 5, 2025
f359c77
cleansing
tolgacangoz Jun 5, 2025
8e3b63f
Merge branch 'main' into skyreels-v2
tolgacangoz Jun 5, 2025
2fa1b38
Update example documentation and parameters in `SkyReelsV2Pipeline`. …
tolgacangoz Jun 5, 2025
4b0a775
Update shift parameter in example documentation and default values ac…
tolgacangoz Jun 5, 2025
ee56e4b
Update example documentation in SkyReels V2 pipelines to include avai…
tolgacangoz Jun 6, 2025
1dadfc2
Add test templates
tolgacangoz Jun 6, 2025
0f86f01
Merge branch 'main' into skyreels-v2
tolgacangoz Jun 6, 2025
619a571
style
tolgacangoz Jun 6, 2025
974fa00
Add docs template
tolgacangoz Jun 6, 2025
3b0ee61
Merge branch 'main' into skyreels-v2
tolgacangoz Jun 6, 2025
6e84a82
Add SkyReels V2 Diffusion Forcing Video-to-Video Pipeline to imports
tolgacangoz Jun 6, 2025
8758da7
style
tolgacangoz Jun 6, 2025
568c59e
fix-copies
tolgacangoz Jun 6, 2025
7759617
convert i2v 1.3b
tolgacangoz Jun 7, 2025
943cd3e
Update transformer configuration to include `image_dim` for SkyReels …
tolgacangoz Jun 7, 2025
993d19d
Refactor transformer import in SkyReels V2 pipeline to use `SkyReelsV…
tolgacangoz Jun 7, 2025
7387e52
Update transformer configuration in SkyReels V2 to increase `in_chann…
tolgacangoz Jun 7, 2025
96af7eb
Update transformer configuration in SkyReels V2 to set `added_kv_proj…
tolgacangoz Jun 7, 2025
a6a7337
up
tolgacangoz Jun 7, 2025
72ad13c
up
tolgacangoz Jun 7, 2025
d069905
up
tolgacangoz Jun 7, 2025
8142720
Add SkyReelsV2Pipeline support for T2V model type in conversion script
tolgacangoz Jun 7, 2025
326b6ed
upp
tolgacangoz Jun 7, 2025
a462222
Refactor model type checks in conversion script to use substring matc…
tolgacangoz Jun 7, 2025
a8c057f
upp
tolgacangoz Jun 7, 2025
6bdfbcf
Fix shard path formatting in conversion script to accommodate varying…
tolgacangoz Jun 7, 2025
db74f87
Update sharded safetensors loading logic in conversion script to use …
tolgacangoz Jun 7, 2025
cc698b6
Update scheduler parameters in SkyReels V2 test files for consistency…
tolgacangoz Jun 8, 2025
9a269a2
Refactor conversion script to initialize text encoder, tokenizer, and…
tolgacangoz Jun 8, 2025
9fd9dba
style
tolgacangoz Jun 8, 2025
bc9eb42
Update documentation for SkyReels-V2, introducing the Infinite-length…
tolgacangoz Jun 8, 2025
de446ad
Add SkyReelsV2Transformer3DModel and FlowMatchUniPCMultistepScheduler…
tolgacangoz Jun 8, 2025
f2f6613
style
tolgacangoz Jun 8, 2025
b707a6c
Update documentation for SkyReelsV2DiffusionForcingPipeline to correc…
tolgacangoz Jun 8, 2025
dc73267
Add documentation for causal_block_size parameter in SkyReelsV2DF pip…
tolgacangoz Jun 8, 2025
c2aab89
Simplify min_ar_step calculation in SkyReelsV2DiffusionForcingPipelin…
tolgacangoz Jun 9, 2025
7ce7a96
style and fix-copies
tolgacangoz Jun 9, 2025
32a6520
style
tolgacangoz Jun 9, 2025
ca1a5f4
Merge branch 'main' into skyreels-v2
tolgacangoz Jun 10, 2025
87e7d08
Merge branch 'main' into skyreels-v2
yiyixuxu Jun 11, 2025
59c4057
Add documentation for SkyReelsV2Transformer3DModel
tolgacangoz Jun 12, 2025
0a7647b
Merge branch 'main' into skyreels-v2
tolgacangoz Jun 12, 2025
9b026e4
Update test configurations for SkyReelsV2 pipelines
tolgacangoz Jun 12, 2025
4c89187
Refines SkyReelsV2DF test parameters
tolgacangoz Jun 12, 2025
6aec002
Update src/diffusers/models/modeling_outputs.py
tolgacangoz Jun 13, 2025
8fcc7f0
Refactor `grid_sizes` processing by using already-calculated post-pat…
tolgacangoz Jun 13, 2025
b5df175
Update docs/source/en/api/pipelines/skyreels_v2.md
tolgacangoz Jun 13, 2025
c446fe5
Refactor parameter naming for diffusion forcing in SkyReelsV2 pipelines
tolgacangoz Jun 13, 2025
7f13e1d
Revert _toctree.yml to adjust section expansion states
tolgacangoz Jun 14, 2025
6931366
style
tolgacangoz Jun 14, 2025
c42f98f
Update docs/source/en/api/models/skyreels_v2_transformer_3d.md
tolgacangoz Jun 14, 2025
9a5b93d
Add copying label to SkyReelsV2ImageEmbedding from WanImageEmbedding.
tolgacangoz Jun 14, 2025
fdabf03
Refactor transformer block processing in SkyReelsV2Transformer3DModel
tolgacangoz Jun 14, 2025
46b32ad
Update SkyReels V2 documentation to remove VRAM requirement and strea…
tolgacangoz Jun 14, 2025
194a9cc
Add SkyReelsV2LoraLoaderMixin for loading and managing LoRA layers in…
tolgacangoz Jun 14, 2025
0e9acff
Update SkyReelsV2 documentation and loader mixin references
tolgacangoz Jun 14, 2025
4dbe834
Enhance SkyReelsV2 integration by adding SkyReelsV2LoraLoaderMixin re…
tolgacangoz Jun 14, 2025
b6675c0
Update SkyReelsV2 model references in documentation
tolgacangoz Jun 14, 2025
7cb257c
style
tolgacangoz Jun 14, 2025
605b97d
Merge branch 'main' into skyreels-v2
tolgacangoz Jun 14, 2025
023336f
fix-copies
tolgacangoz Jun 14, 2025
4fd94bf
Refactor `fps_projection` in `SkyReelsV2Transformer3DModel`
tolgacangoz Jun 15, 2025
74c2209
Update docs
tolgacangoz Jun 16, 2025
ebc7714
Refactor video processing in SkyReelsV2DiffusionForcingPipeline
tolgacangoz Jun 16, 2025
fa715d6
Update activation function in `fps_projection` of `SkyReelsV2Transfor…
tolgacangoz Jun 16, 2025
56ea438
Add fps_projection layer renaming in convert_skyreelsv2_to_diffusers.py
tolgacangoz Jun 16, 2025
829d632
Fix fps_projection assignment in SkyReelsV2Transformer3DModel
tolgacangoz Jun 16, 2025
0b7d7ea
Update _keep_in_fp32_modules in SkyReelsV2Transformer3DModel
tolgacangoz Jun 16, 2025
6a1f857
Remove integration test classes from SkyReelsV2 test files
tolgacangoz Jun 16, 2025
2d35933
style
tolgacangoz Jun 16, 2025
a648bc6
Merge branch 'main' into skyreels-v2
tolgacangoz Jun 17, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -329,6 +329,8 @@
title: SanaTransformer2DModel
- local: api/models/sd3_transformer2d
title: SD3Transformer2DModel
- local: api/models/skyreels_v2_transformer_3d
title: SkyReelsV2Transformer3DModel
- local: api/models/stable_audio_transformer
title: StableAudioDiTModel
- local: api/models/transformer2d
Expand Down Expand Up @@ -523,6 +525,8 @@
title: Semantic Guidance
- local: api/pipelines/shap_e
title: Shap-E
- local: api/pipelines/skyreels_v2
title: SkyReels-V2
- local: api/pipelines/stable_audio
title: Stable Audio
- local: api/pipelines/stable_cascade
Expand Down Expand Up @@ -626,6 +630,8 @@
title: FlowMatchEulerDiscreteScheduler
- local: api/schedulers/flow_match_heun_discrete
title: FlowMatchHeunDiscreteScheduler
- local: api/schedulers/flow_match_unipc
title: FlowMatchUniPCMultistepScheduler
- local: api/schedulers/heun
title: HeunDiscreteScheduler
- local: api/schedulers/ipndm
Expand Down
11 changes: 6 additions & 5 deletions docs/source/en/api/loaders/lora.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ LoRA is a fast and lightweight training method that inserts and trains a signifi
- [`HunyuanVideoLoraLoaderMixin`] provides similar functions for [HunyuanVideo](https://huggingface.co/docs/diffusers/main/en/api/pipelines/hunyuan_video).
- [`Lumina2LoraLoaderMixin`] provides similar functions for [Lumina2](https://huggingface.co/docs/diffusers/main/en/api/pipelines/lumina2).
- [`WanLoraLoaderMixin`] provides similar functions for [Wan](https://huggingface.co/docs/diffusers/main/en/api/pipelines/wan).
- [`SkyReelsV2LoraLoaderMixin`] provides similar functions for [SkyReels-V2](https://huggingface.co/docs/diffusers/main/en/api/pipelines/skyreels_v2).
- [`CogView4LoraLoaderMixin`] provides similar functions for [CogView4](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cogview4).
- [`AmusedLoraLoaderMixin`] is for the [`AmusedPipeline`].
- [`HiDreamImageLoraLoaderMixin`] provides similar functions for [HiDream Image](https://huggingface.co/docs/diffusers/main/en/api/pipelines/hidream)
Expand Down Expand Up @@ -88,6 +89,10 @@ To learn more about how to load LoRA weights, see the [LoRA](../../using-diffuse

[[autodoc]] loaders.lora_pipeline.WanLoraLoaderMixin

## SkyReelsV2LoraLoaderMixin

[[autodoc]] loaders.lora_pipeline.SkyReelsV2LoraLoaderMixin

## AmusedLoraLoaderMixin

[[autodoc]] loaders.lora_pipeline.AmusedLoraLoaderMixin
Expand All @@ -98,8 +103,4 @@ To learn more about how to load LoRA weights, see the [LoRA](../../using-diffuse

## LoraBaseMixin

[[autodoc]] loaders.lora_base.LoraBaseMixin

## WanLoraLoaderMixin

[[autodoc]] loaders.lora_pipeline.WanLoraLoaderMixin
[[autodoc]] loaders.lora_base.LoraBaseMixin
30 changes: 30 additions & 0 deletions docs/source/en/api/models/skyreels_v2_transformer_3d.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->

# SkyReelsV2Transformer3DModel

A Diffusion Transformer model for 3D video-like data was introduced in [SkyReels-V2](https://github.com/SkyworkAI/SkyReels-V2) by the Skywork AI.

The model can be loaded with the following code snippet.

```python
from diffusers import SkyReelsV2Transformer3DModel

transformer = SkyReelsV2Transformer3DModel.from_pretrained("Skywork/SkyReels-V2-DF-1.3B-540P-Diffusers", subfolder="transformer", torch_dtype=torch.bfloat16)
```

## SkyReelsV2Transformer3DModel

[[autodoc]] SkyReelsV2Transformer3DModel

## Transformer2DModelOutput

[[autodoc]] models.modeling_outputs.Transformer2DModelOutput
249 changes: 249 additions & 0 deletions docs/source/en/api/pipelines/skyreels_v2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,249 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License. -->

<div style="float: right;">
<div class="flex flex-wrap space-x-1">
<a href="https://huggingface.co/docs/diffusers/main/en/tutorials/using_peft_for_inference" target="_blank" rel="noopener">
<img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
</a>
</div>
</div>

# SkyReels-V2: Infinite-length Film Generative model

[SkyReels-V2](https://huggingface.co/papers/2504.13074) by the SkyReels Team.

*Recent advances in video generation have been driven by diffusion models and autoregressive frameworks, yet critical challenges persist in harmonizing prompt adherence, visual quality, motion dynamics, and duration: compromises in motion dynamics to enhance temporal visual quality, constrained video duration (5-10 seconds) to prioritize resolution, and inadequate shot-aware generation stemming from general-purpose MLLMs' inability to interpret cinematic grammar, such as shot composition, actor expressions, and camera motions. These intertwined limitations hinder realistic long-form synthesis and professional film-style generation. To address these limitations, we propose SkyReels-V2, an Infinite-length Film Generative Model, that synergizes Multi-modal Large Language Model (MLLM), Multi-stage Pretraining, Reinforcement Learning, and Diffusion Forcing Framework. Firstly, we design a comprehensive structural representation of video that combines the general descriptions by the Multi-modal LLM and the detailed shot language by sub-expert models. Aided with human annotation, we then train a unified Video Captioner, named SkyCaptioner-V1, to efficiently label the video data. Secondly, we establish progressive-resolution pretraining for the fundamental video generation, followed by a four-stage post-training enhancement: Initial concept-balanced Supervised Fine-Tuning (SFT) improves baseline quality; Motion-specific Reinforcement Learning (RL) training with human-annotated and synthetic distortion data addresses dynamic artifacts; Our diffusion forcing framework with non-decreasing noise schedules enables long-video synthesis in an efficient search space; Final high-quality SFT refines visual fidelity. All the code and models are available at [this https URL](https://github.com/SkyworkAI/SkyReels-V2).*

You can find all the original SkyReels-V2 checkpoints under the [Skywork](https://huggingface.co/collections/Skywork/skyreels-v2-6801b1b93df627d441d0d0d9) organization.

The following SkyReels-V2 models are supported in Diffusers:
- [SkyReels-V2 DF 1.3B - 540P](https://huggingface.co/Skywork/SkyReels-V2-DF-1.3B-540P-Diffusers)
- [SkyReels-V2 DF 14B - 540P](https://huggingface.co/Skywork/SkyReels-V2-DF-14B-540P-Diffusers)
- [SkyReels-V2 DF 14B - 720P](https://huggingface.co/Skywork/SkyReels-V2-DF-14B-720P-Diffusers)
- [SkyReels-V2 T2V 14B - 540P](https://huggingface.co/Skywork/SkyReels-V2-T2V-14B-540P-Diffusers)
- [SkyReels-V2 T2V 14B - 720P](https://huggingface.co/Skywork/SkyReels-V2-T2V-14B-720P-Diffusers)
- [SkyReels-V2 I2V 1.3B - 540P](https://huggingface.co/Skywork/SkyReels-V2-I2V-1.3B-540P-Diffusers)
- [SkyReels-V2 I2V 14B - 540P](https://huggingface.co/Skywork/SkyReels-V2-I2V-14B-540P-Diffusers)
- [SkyReels-V2 I2V 14B - 720P](https://huggingface.co/Skywork/SkyReels-V2-I2V-14B-720P-Diffusers)
Comment on lines +32 to +39
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @yiyixuxu Do we have contact with the SkyReels team and do we know if they would be okay with hosting the weights? If it's not possible, we could maintain skyreels-community org similar to hunyuan

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, let me check

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


> [!TIP]
> Click on the SkyReels-V2 models in the right sidebar for more examples of video generation.

### Text-to-Video Generation

The example below demonstrates how to generate a video from text optimized for memory or inference speed.

<hfoptions id="T2V usage">
<hfoption id="T2V memory">

Refer to the [Reduce memory usage](../../optimization/memory) guide for more details about the various memory saving techniques.

```py
# pip install ftfy
import torch
import numpy as np
from diffusers import AutoModel, SkyReelsV2DiffusionForcingPipeline
from diffusers.hooks.group_offloading import apply_group_offloading
from diffusers.utils import export_to_video
from transformers import UMT5EncoderModel

text_encoder = UMT5EncoderModel.from_pretrained("Skywork/SkyReels-V2-DF-14B-540P-Diffusers", subfolder="text_encoder", torch_dtype=torch.bfloat16)
vae = AutoModel.from_pretrained("Skywork/SkyReels-V2-DF-14B-540P-Diffusers", subfolder="vae", torch_dtype=torch.float32)
transformer = AutoModel.from_pretrained("Skywork/SkyReels-V2-DF-14B-540P-Diffusers", subfolder="transformer", torch_dtype=torch.bfloat16)

# group-offloading
onload_device = torch.device("cuda")
offload_device = torch.device("cpu")
apply_group_offloading(text_encoder,
onload_device=onload_device,
offload_device=offload_device,
offload_type="block_level",
num_blocks_per_group=4
)
transformer.enable_group_offload(
onload_device=onload_device,
offload_device=offload_device,
offload_type="leaf_level",
use_stream=True
)

pipeline = SkyReelsV2DiffusionForcingPipeline.from_pretrained(
"Skywork/SkyReels-V2-DF-14B-540P-Diffusers",
vae=vae,
transformer=transformer,
text_encoder=text_encoder,
torch_dtype=torch.bfloat16
)
pipe = pipe.to("cuda")
pipe.transformer.set_ar_attention(causal_block_size=5)

prompt = "A cat and a dog baking a cake together in a kitchen. The cat is carefully measuring flour, while the dog is stirring the batter with a wooden spoon. The kitchen is cozy, with sunlight streaming through the window."

output = pipe(
prompt=prompt,
num_inference_steps=30,
height=544,
width=960,
num_frames=97,
ar_step=5, # Controls asynchronous inference (0 for synchronous mode)
overlap_history=None, # Number of frames to overlap for smooth transitions in long videos; 17 for long
addnoise_condition=20, # Improves consistency in long video generation
).frames[0]
export_to_video(output, "T2V.mp4", fps=24, quality=8)
```

</hfoption>
</hfoptions>

### First-Last-Frame-to-Video Generation

The example below demonstrates how to use the image-to-video pipeline to generate a video using a text description, a starting frame, and an ending frame.

<hfoptions id="FLF2V usage">
<hfoption id="usage">

```python
import numpy as np
import torch
import torchvision.transforms.functional as TF
from diffusers import AutoencoderKLWan, SkyReelsV2DiffusionForcingImageToVideoPipeline
from diffusers.utils import export_to_video, load_image


model_id = "Skywork/SkyReels-V2-DF-14B-720P-Diffusers"
vae = AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32)
pipe = SkyReelsV2DiffusionForcingImageToVideoPipeline.from_pretrained(
model_id, vae=vae, torch_dtype=torch.bfloat16
)
pipe.to("cuda")

first_frame = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/flf2v_input_first_frame.png")
last_frame = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/flf2v_input_last_frame.png")

def aspect_ratio_resize(image, pipe, max_area=720 * 1280):
aspect_ratio = image.height / image.width
mod_value = pipe.vae_scale_factor_spatial * pipe.transformer.config.patch_size[1]
height = round(np.sqrt(max_area * aspect_ratio)) // mod_value * mod_value
width = round(np.sqrt(max_area / aspect_ratio)) // mod_value * mod_value
image = image.resize((width, height))
return image, height, width

def center_crop_resize(image, height, width):
# Calculate resize ratio to match first frame dimensions
resize_ratio = max(width / image.width, height / image.height)

# Resize the image
width = round(image.width * resize_ratio)
height = round(image.height * resize_ratio)
size = [width, height]
image = TF.center_crop(image, size)

return image, height, width

first_frame, height, width = aspect_ratio_resize(first_frame, pipe)
if last_frame.size != first_frame.size:
last_frame, _, _ = center_crop_resize(last_frame, height, width)

prompt = "CG animation style, a small blue bird takes off from the ground, flapping its wings. The bird's feathers are delicate, with a unique pattern on its chest. The background shows a blue sky with white clouds under bright sunshine. The camera follows the bird upward, capturing its flight and the vastness of the sky from a close-up, low-angle perspective."

output = pipe(
image=first_frame, last_image=last_frame, prompt=prompt, height=height, width=width, guidance_scale=5.0
).frames[0]
export_to_video(output, "output.mp4", fps=24)
```

</hfoption>
</hfoptions>


## Notes

- SkyReels-V2 supports LoRAs with [`~loaders.SkyReelsV2LoraLoaderMixin.load_lora_weights`].

<details>
<summary>Show example code</summary>

```py
# pip install ftfy
import torch
from diffusers import AutoModel, SkyReelsV2DiffusionForcingPipeline
from diffusers.utils import export_to_video

vae = AutoModel.from_pretrained(
"Skywork/SkyReels-V2-DF-1.3B-540P-Diffusers", subfolder="vae", torch_dtype=torch.float32
)
pipeline = SkyReelsV2DiffusionForcingPipeline.from_pretrained(
"Skywork/SkyReels-V2-DF-1.3B-540P-Diffusers", vae=vae, torch_dtype=torch.bfloat16
)
pipeline.to("cuda")

pipeline.load_lora_weights("benjamin-paine/steamboat-willie-1.3b", adapter_name="steamboat-willie")
pipeline.set_adapters("steamboat-willie")

pipeline.enable_model_cpu_offload()

# use "steamboat willie style" to trigger the LoRA
prompt = """
steamboat willie style, golden era animation, The camera rushes from far to near in a low-angle shot,
revealing a white ferret on a log. It plays, leaps into the water, and emerges, as the camera zooms in
for a close-up. Water splashes berry bushes nearby, while moss, snow, and leaves blanket the ground.
Birch trees and a light blue sky frame the scene, with ferns in the foreground. Side lighting casts dynamic
shadows and warm highlights. Medium composition, front view, low angle, with depth of field.
"""

output = pipeline(
prompt=prompt,
num_frames=97,
guidance_scale=6.0,
).frames[0]
export_to_video(output, "output.mp4", fps=24)
```

</details>


## SkyReelsV2DiffusionForcingPipeline

[[autodoc]] SkyReelsV2DiffusionForcingPipeline
- all
- __call__

## SkyReelsV2DiffusionForcingImageToVideoPipeline

[[autodoc]] SkyReelsV2DiffusionForcingImageToVideoPipeline
- all
- __call__

## SkyReelsV2DiffusionForcingVideoToVideoPipeline

[[autodoc]] SkyReelsV2DiffusionForcingVideoToVideoPipeline
- all
- __call__

## SkyReelsV2Pipeline

[[autodoc]] SkyReelsV2Pipeline
- all
- __call__

## SkyReelsV2ImageToVideoPipeline

[[autodoc]] SkyReelsV2ImageToVideoPipeline
- all
- __call__

## SkyReelsV2PipelineOutput

[[autodoc]] pipelines.skyreels_v2.pipeline_output.SkyReelsV2PipelineOutput
18 changes: 18 additions & 0 deletions docs/source/en/api/schedulers/flow_match_unipc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

# FlowMatchUniPCMultistepScheduler

`FlowMatchUniPCMultistepScheduler` is based on the flow-matching sampling introduced in [Stable Diffusion 3](https://huggingface.co/papers/2403.03206).

## FlowMatchUniPCMultistepScheduler
[[autodoc]] FlowMatchUniPCMultistepScheduler
Loading