-
Notifications
You must be signed in to change notification settings - Fork 6k
Add SkyReels V2: Infinite-Length Film Generative Model #11518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
tolgacangoz
wants to merge
283
commits into
huggingface:main
Choose a base branch
from
tolgacangoz:skyreels-v2
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+8,152
−6
Open
Changes from all commits
Commits
Show all changes
283 commits
Select commit
Hold shift + click to select a range
ded93bc
Enhance `SkyReelsV2DiffusionForcingImageToVideoPipeline` by refining …
tolgacangoz c9483b2
Remove unused dtype handling in `SkyReelsV2DiffusionForcingPipeline` …
tolgacangoz f7fed01
up
tolgacangoz 0e7b21d
up
tolgacangoz b3698d7
Update references
tolgacangoz 7e0f0f5
Add `generate_timestep_matrix` method to `SkyReelsV2DiffusionForcingP…
tolgacangoz 47080c2
Merge branch 'main' into skyreels-v2
tolgacangoz 8c23208
Remove training-related code
tolgacangoz 1f8e268
Add gradient checkpointing support in `SkyReelsV2Transformer3DModel` …
tolgacangoz d853521
Refactor `SkyReelsV2TransformerBlock` and remove unused `Head` class.…
tolgacangoz 2b79584
Remove unused parameter `y` and associated documentation from `SkyRee…
tolgacangoz 600ced3
Update context length calculation in `SkyReelsV2AttnProcessor2_0` to …
tolgacangoz 586fe56
Fix comparison logic in `SkyReelsV2AttnProcessor2_0` to correctly det…
tolgacangoz afcaf6e
Remove unused `flex_attention` variable from `transformer_skyreels_v2…
tolgacangoz 465df8c
Updates SkyReelsV2 pipeline defaults and docs
tolgacangoz cad2d38
Remove `enable_teacache` functionality from `SkyReelsV2Transformer3DM…
tolgacangoz 1fcdf98
Refactor `SkyReelsV2Transformer3DModel` to use configuration paramete…
tolgacangoz 6d57725
Remove unused import of `numpy` and clean up whitespace in `transform…
tolgacangoz c4cec04
Refactor `SkyReelsV2DiffusionForcingPipeline` to improve error handli…
tolgacangoz 6a85ba1
Refactor `SkyReelsV2DiffusionForcingPipeline` to enhance sample sched…
tolgacangoz 76af29b
update template for df_i2v
tolgacangoz 81206ce
style
tolgacangoz 906b6f5
Refactor `SkyReelsV2DiffusionForcingPipeline` to improve the handling…
tolgacangoz e2391b6
Add newly released `SkyReelsV2DiffusionForcingVideoToVideoPipeline` t…
tolgacangoz 245534f
up df_i2v
tolgacangoz aaa8a8b
Refactor `SkyReelsV2DiffusionForcingPipeline` to improve the handling…
tolgacangoz ca3f7bd
Integrate video decoding in pipeline
tolgacangoz b4e26fd
up
tolgacangoz c3bcd1d
Fix variable name typo in `SkyReelsV2DiffusionForcingPipeline` from `…
tolgacangoz c9bea14
Fix variable name from `casual_block_size` to `causal_block_size` for…
tolgacangoz 00fdeb0
Update `_no_split_modules` in `SkyReelsV2Transformer3DModel` and adju…
tolgacangoz cf91fb4
Refactor type hint for `device` parameter in `_prepare_blockwise_caus…
tolgacangoz 256fa6d
Refactor `SkyReelsV2DiffusionForcingPipeline` to streamline the setti…
tolgacangoz a74252c
Add `flag_df` parameter to `SkyReelsV2Transformer3DModel` for improve…
tolgacangoz 771fb05
Refactor `SkyReelsV2DiffusionForcingPipeline` to enhance clarity and …
tolgacangoz 8e61893
Merge branch 'main' into skyreels-v2
tolgacangoz bccad55
Add script for converting SkyReelsV2 models to Diffusers format
tolgacangoz 59c1e88
down
tolgacangoz 02f038d
Update documentation in `SkyReelsV2DiffusionForcingPipeline` to clari…
tolgacangoz 32ca01a
up
tolgacangoz 02ffe0c
Refactor model directory path handling in `convert_transformer` funct…
tolgacangoz a215677
fix "inject_sample_info": true,
tolgacangoz 1e4c501
temp fix
tolgacangoz 322ce0c
up
tolgacangoz b7d54d6
fix `qk_norm`
tolgacangoz be77ad8
Refactor `convert_skyreelsv2_to_diffusers.py` to use `SkyreelsV2Image…
tolgacangoz 6f8ffb2
for vae
tolgacangoz 4576f6e
for t5
tolgacangoz 10174ca
up
tolgacangoz 9223f2d
temp fix
tolgacangoz a1aadd3
up
tolgacangoz f369cc4
Remove assertion for 1D timesteps in `get_timestep_embedding` functio…
tolgacangoz eb32376
Refactor timestep handling in `SkyReelsV2DiffusionForcingPipeline` to…
tolgacangoz 671b37e
Enhance `get_timestep_embedding` to support 2D tensor inputs, allowin…
tolgacangoz 6f8bf30
Fix unflattening of timestep projection in `SkyReelsV2Transformer3DMo…
tolgacangoz c71d3aa
Update dtype handling in `SkyReelsV2Transformer3DModel` to ensure con…
tolgacangoz 1afa337
Refactor tensor reshaping in `SkyReelsV2Transformer3DModel` to utiliz…
tolgacangoz c74675c
Refactor timestep preparation in `SkyReelsV2DiffusionForcingPipeline`…
tolgacangoz 602cff7
fix: multi-dimentional indexing
tolgacangoz 237e468
Comment out tensor unsqueezing in `SkyReelsV2DiffusionForcingPipeline…
tolgacangoz 40c456d
Update dtype handling in `SkyReelsV2DiffusionForcingPipeline` to use …
tolgacangoz 9ed88da
fix dype
tolgacangoz 6a3c7bf
fix
tolgacangoz 5652aa0
Refactor sample scheduler initialization in `SkyReelsV2DiffusionForci…
tolgacangoz e529fea
Adds shift parameter to scheduler timestep setting
tolgacangoz b3ffeca
Fix slicing of latents in `SkyReelsV2DiffusionForcingPipeline` to ens…
tolgacangoz 4479afc
Fix tensor slicing in `SkyReelsV2DiffusionForcingPipeline` to ensure …
tolgacangoz e4f6743
Update progress bar total in `SkyReelsV2DiffusionForcingPipeline` to …
tolgacangoz 7420446
Refactor error handling and tensor processing in `SkyReelsV2Diffusion…
tolgacangoz 2d59ebd
Refactor tensor processing and noise application in `SkyReelsV2Diffus…
tolgacangoz 8af4a9f
Refactor variable naming and tensor handling in `SkyReelsV2DiffusionF…
tolgacangoz 57a2bf9
style
tolgacangoz ae6adbe
fix number of frames for long video generation
tolgacangoz 9afb214
up
tolgacangoz f1483ad
fix: `latents` initialization for long video generation in processing…
tolgacangoz a16c31b
update templates
tolgacangoz 3b7b63b
Enhance `convert_skyreelsv2_to_diffusers.py` by adding support for lo…
tolgacangoz 5e1126d
Update model configuration in `convert_skyreelsv2_to_diffusers.py` to…
tolgacangoz 820d415
Refactor `set_ar_attention` method in `SkyReelsV2Transformer3DModel` …
tolgacangoz 528e0d7
up
tolgacangoz 6c4301c
up
tolgacangoz 7d5328f
upp
tolgacangoz 00849fd
fix file name
tolgacangoz 8e34d89
Update `SkyReelsV2Transformer3DModel` to conditionally apply `causal_…
tolgacangoz 493a08c
Merge branch 'main' into skyreels-v2
tolgacangoz a6f0d11
style
tolgacangoz cc0660c
Fix class name casing for SkyReelsV2 components in multiple files to …
tolgacangoz 14d8d7a
cleaning
tolgacangoz 85a1f90
cleansing
tolgacangoz 5264ac9
Refactor `get_timestep_embedding` to move modifications into `SkyReel…
tolgacangoz 81acfae
Remove unnecessary line break in `get_timestep_embedding` function fo…
tolgacangoz 11baa00
Remove `skyreels_v2` entry from `_import_structure` and update its in…
tolgacangoz 2906c37
cleansing
tolgacangoz a38eaab
Refactor attention processing in `SkyReelsV2AttnProcessor2_0` to alwa…
tolgacangoz 150ea56
Enhance example usage in `pipeline_skyreels_v2_diffusion_forcing.py` …
tolgacangoz ad7d4c4
Refactor import structure in `__init__.py` for SkyReelsV2 components …
tolgacangoz ed7843a
Merge branch 'main' into skyreels-v2
tolgacangoz f1ee024
Update `guidance_scale` parameter in `SkyReelsV2DiffusionForcingPipel…
tolgacangoz 421e0dc
Update `guidance_scale` parameter in example documentation and class …
tolgacangoz 4b688c4
Update `causal_block_size` parameter in `SkyReelsV2DiffusionForcingPi…
tolgacangoz c6b5391
up
tolgacangoz 3bf1e4a
Fix dtype conversion for `timestep_proj` in `SkyReelsV2Transformer3DM…
tolgacangoz f48363c
Optimize causal mask generation by replacing repeated tensor with `re…
tolgacangoz 920d956
style
tolgacangoz cedee34
Merge branch 'main' into skyreels-v2
tolgacangoz db9cda9
Enhance example documentation in `SkyReelsV2DiffusionForcingPipeline`…
tolgacangoz ff6eeea
Refactor sample scheduler creation in `SkyReelsV2DiffusionForcingPipe…
tolgacangoz 82db3ab
Merge branch 'main' into skyreels-v2
tolgacangoz c0abccc
Enhance error handling and documentation in `SkyReelsV2DiffusionForci…
tolgacangoz 35061d0
Update documentation and progress bar handling in `SkyReelsV2Diffusio…
tolgacangoz cede08c
Refine progress bar calculation in `SkyReelsV2DiffusionForcingPipelin…
tolgacangoz 5bc9a1b
Update import statements in `SkyReelsV2DiffusionForcingPipeline` docu…
tolgacangoz 0cdfb99
Merge branch 'main' into skyreels-v2
tolgacangoz 5c658c9
Refactor progress bar handling in `SkyReelsV2DiffusionForcingPipeline…
tolgacangoz b30a426
update templates for i2v, v2v
tolgacangoz 238d07d
Add `retrieve_latents` function to streamline latent retrieval in `Sk…
tolgacangoz d3bd638
Add `retrieve_latents` function to both i2v and v2v pipelines for con…
tolgacangoz 2aab1de
Remove redundant ValueError for `overlap_history` in `SkyReelsV2Diffu…
tolgacangoz 8ab5bb1
Update default video dimensions and flow matching scheduler parameter…
tolgacangoz 323ec66
Refactor `SkyReelsV2DiffusionForcingPipeline` to support Image-to-Vid…
tolgacangoz ce804ad
Improve organization for image-last_image condition.
tolgacangoz ff97206
Refactor `SkyReelsV2DiffusionForcingImageToVideoPipeline` to improve …
tolgacangoz 5d702cf
style
tolgacangoz b6536ed
Merge branch 'main' into skyreels-v2
tolgacangoz 9d35809
style
tolgacangoz 0f915f6
Add example usage of PIL for image input in `SkyReelsV2DiffusionForci…
tolgacangoz 9a6746b
Refactor `SkyReelsV2DiffusionForcingPipeline` to `SkyReelsV2Diffusion…
tolgacangoz b879963
Refactor `SkyReelsV2DiffusionForcingImageToVideoPipeline` by removing…
tolgacangoz 7f35894
Refactor `SkyReelsV2DiffusionForcingImageToVideoPipeline` to enhance …
tolgacangoz a97d4d8
Enhance `SkyReelsV2DiffusionForcingPipeline` by refining latent prepa…
tolgacangoz e2bfbfa
refactor
tolgacangoz 594082e
fix num_frames
tolgacangoz c4c9c0a
fix prefix_video_latents
tolgacangoz 79960de
up
tolgacangoz f6cd857
refactor
tolgacangoz 3ce9b05
Fix typo in scheduler method call within `SkyReelsV2DiffusionForcingV…
tolgacangoz f1b8508
up
tolgacangoz aad0feb
Enhance `SkyReelsV2DiffusionForcingImageToVideoPipeline` by adding su…
tolgacangoz 0958647
add statistics
tolgacangoz fcfc7f4
Refine latent frame handling in `SkyReelsV2DiffusionForcingImageToVid…
tolgacangoz b197ffb
up
tolgacangoz 54f1aa5
refactor
tolgacangoz 0a45793
up
tolgacangoz 37649b2
Refactor `SkyReelsV2DiffusionForcingVideoToVideoPipeline` to improve …
tolgacangoz 46c6e72
style
tolgacangoz 0edb263
4d724df
fix vae output indexing
tolgacangoz 79dbd0e
upup
tolgacangoz 22c761e
fbf5cc1
92c4e8c
up
tolgacangoz bb9ca6f
Fix tensor concatenation and repetition logic in `SkyReelsV2Diffusion…
tolgacangoz 18e525f
Refactor latent retrieval logic in `SkyReelsV2DiffusionForcingVideoTo…
tolgacangoz 528a811
Enhance logging in `SkyReelsV2DiffusionForcing` pipelines by adding i…
tolgacangoz 7814f7d
Update latent handling in `SkyReelsV2DiffusionForcingImageToVideoPipe…
tolgacangoz 6d1d1e9
Refactor `SkyReelsV2TimeTextImageEmbedding` to utilize `get_1d_sincos…
tolgacangoz 82d86e4
Enhance `get_1d_sincos_pos_embed_from_grid` function to include an op…
tolgacangoz 2dce751
Update timestep projection in `SkyReelsV2TimeTextImageEmbedding` to i…
tolgacangoz a4aa0ba
Refactor tensor type handling in `SkyReelsV2AttnProcessor2_0` and `Sk…
tolgacangoz 63814a0
Update tensor type in `SkyReelsV2RotaryPosEmbed` to use `torch.float3…
tolgacangoz a74248f
Refactor `SkyReelsV2TimeTextImageEmbedding` to utilize automatic mixe…
tolgacangoz efccb9e
down
tolgacangoz b836618
down
tolgacangoz 11cd6fb
style
tolgacangoz 786f145
Add debug tensor tracking to `SkyReelsV2Transformer3DModel` for enhan…
tolgacangoz b597f9e
up
tolgacangoz 6caffc9
Refactor indentation in `SkyReelsV2AttnProcessor2_0` to improve code …
tolgacangoz 848acfc
Convert query, key, and value tensors to bfloat16 in `SkyReelsV2AttnP…
tolgacangoz a8e01ba
Add debug print statements in `SkyReelsV2TransformerBlock` to track t…
tolgacangoz aef60a1
debug
tolgacangoz f70627a
7a98f19
debug
tolgacangoz 17e931a
Remove commented-out debug tensor tracking from `SkyReelsV2Transforme…
tolgacangoz 19dae16
Add functionality to save processed video latents as a Safetensors fi…
tolgacangoz 2947e52
up
tolgacangoz 324e7fe
Add functionality to save output latents as a Safetensors file in `Sk…
tolgacangoz abf59a5
up
tolgacangoz e227b38
Remove additional commented-out debug tensor tracking from `SkyReelsV…
tolgacangoz c6ef3cf
style
tolgacangoz f359c77
cleansing
tolgacangoz 8e3b63f
Merge branch 'main' into skyreels-v2
tolgacangoz 2fa1b38
Update example documentation and parameters in `SkyReelsV2Pipeline`. …
tolgacangoz 4b0a775
Update shift parameter in example documentation and default values ac…
tolgacangoz ee56e4b
Update example documentation in SkyReels V2 pipelines to include avai…
tolgacangoz 1dadfc2
Add test templates
tolgacangoz 0f86f01
Merge branch 'main' into skyreels-v2
tolgacangoz 619a571
style
tolgacangoz 974fa00
Add docs template
tolgacangoz 3b0ee61
Merge branch 'main' into skyreels-v2
tolgacangoz 6e84a82
Add SkyReels V2 Diffusion Forcing Video-to-Video Pipeline to imports
tolgacangoz 8758da7
style
tolgacangoz 568c59e
fix-copies
tolgacangoz 7759617
convert i2v 1.3b
tolgacangoz 943cd3e
Update transformer configuration to include `image_dim` for SkyReels …
tolgacangoz 993d19d
Refactor transformer import in SkyReels V2 pipeline to use `SkyReelsV…
tolgacangoz 7387e52
Update transformer configuration in SkyReels V2 to increase `in_chann…
tolgacangoz 96af7eb
Update transformer configuration in SkyReels V2 to set `added_kv_proj…
tolgacangoz a6a7337
up
tolgacangoz 72ad13c
up
tolgacangoz d069905
up
tolgacangoz 8142720
Add SkyReelsV2Pipeline support for T2V model type in conversion script
tolgacangoz 326b6ed
upp
tolgacangoz a462222
Refactor model type checks in conversion script to use substring matc…
tolgacangoz a8c057f
upp
tolgacangoz 6bdfbcf
Fix shard path formatting in conversion script to accommodate varying…
tolgacangoz db74f87
Update sharded safetensors loading logic in conversion script to use …
tolgacangoz cc698b6
Update scheduler parameters in SkyReels V2 test files for consistency…
tolgacangoz 9a269a2
Refactor conversion script to initialize text encoder, tokenizer, and…
tolgacangoz 9fd9dba
style
tolgacangoz bc9eb42
Update documentation for SkyReels-V2, introducing the Infinite-length…
tolgacangoz de446ad
Add SkyReelsV2Transformer3DModel and FlowMatchUniPCMultistepScheduler…
tolgacangoz f2f6613
style
tolgacangoz b707a6c
Update documentation for SkyReelsV2DiffusionForcingPipeline to correc…
tolgacangoz dc73267
Add documentation for causal_block_size parameter in SkyReelsV2DF pip…
tolgacangoz c2aab89
Simplify min_ar_step calculation in SkyReelsV2DiffusionForcingPipelin…
tolgacangoz 7ce7a96
style and fix-copies
tolgacangoz 32a6520
style
tolgacangoz ca1a5f4
Merge branch 'main' into skyreels-v2
tolgacangoz 87e7d08
Merge branch 'main' into skyreels-v2
yiyixuxu 59c4057
Add documentation for SkyReelsV2Transformer3DModel
tolgacangoz 0a7647b
Merge branch 'main' into skyreels-v2
tolgacangoz 9b026e4
Update test configurations for SkyReelsV2 pipelines
tolgacangoz 4c89187
Refines SkyReelsV2DF test parameters
tolgacangoz 6aec002
Update src/diffusers/models/modeling_outputs.py
tolgacangoz 8fcc7f0
Refactor `grid_sizes` processing by using already-calculated post-pat…
tolgacangoz b5df175
Update docs/source/en/api/pipelines/skyreels_v2.md
tolgacangoz c446fe5
Refactor parameter naming for diffusion forcing in SkyReelsV2 pipelines
tolgacangoz 7f13e1d
Revert _toctree.yml to adjust section expansion states
tolgacangoz 6931366
style
tolgacangoz c42f98f
Update docs/source/en/api/models/skyreels_v2_transformer_3d.md
tolgacangoz 9a5b93d
Add copying label to SkyReelsV2ImageEmbedding from WanImageEmbedding.
tolgacangoz fdabf03
Refactor transformer block processing in SkyReelsV2Transformer3DModel
tolgacangoz 46b32ad
Update SkyReels V2 documentation to remove VRAM requirement and strea…
tolgacangoz 194a9cc
Add SkyReelsV2LoraLoaderMixin for loading and managing LoRA layers in…
tolgacangoz 0e9acff
Update SkyReelsV2 documentation and loader mixin references
tolgacangoz 4dbe834
Enhance SkyReelsV2 integration by adding SkyReelsV2LoraLoaderMixin re…
tolgacangoz b6675c0
Update SkyReelsV2 model references in documentation
tolgacangoz 7cb257c
style
tolgacangoz 605b97d
Merge branch 'main' into skyreels-v2
tolgacangoz 023336f
fix-copies
tolgacangoz 4fd94bf
Refactor `fps_projection` in `SkyReelsV2Transformer3DModel`
tolgacangoz 74c2209
Update docs
tolgacangoz ebc7714
Refactor video processing in SkyReelsV2DiffusionForcingPipeline
tolgacangoz fa715d6
Update activation function in `fps_projection` of `SkyReelsV2Transfor…
tolgacangoz 56ea438
Add fps_projection layer renaming in convert_skyreelsv2_to_diffusers.py
tolgacangoz 829d632
Fix fps_projection assignment in SkyReelsV2Transformer3DModel
tolgacangoz 0b7d7ea
Update _keep_in_fp32_modules in SkyReelsV2Transformer3DModel
tolgacangoz 6a1f857
Remove integration test classes from SkyReelsV2 test files
tolgacangoz 2d35933
style
tolgacangoz a648bc6
Merge branch 'main' into skyreels-v2
tolgacangoz File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
<!-- Copyright 2024 The HuggingFace Team. All rights reserved. | ||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | ||
the License. You may obtain a copy of the License at | ||
http://www.apache.org/licenses/LICENSE-2.0 | ||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | ||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations under the License. --> | ||
|
||
# SkyReelsV2Transformer3DModel | ||
|
||
A Diffusion Transformer model for 3D video-like data was introduced in [SkyReels-V2](https://github.com/SkyworkAI/SkyReels-V2) by the Skywork AI. | ||
|
||
The model can be loaded with the following code snippet. | ||
|
||
```python | ||
from diffusers import SkyReelsV2Transformer3DModel | ||
|
||
transformer = SkyReelsV2Transformer3DModel.from_pretrained("Skywork/SkyReels-V2-DF-1.3B-540P-Diffusers", subfolder="transformer", torch_dtype=torch.bfloat16) | ||
``` | ||
|
||
## SkyReelsV2Transformer3DModel | ||
|
||
[[autodoc]] SkyReelsV2Transformer3DModel | ||
|
||
## Transformer2DModelOutput | ||
|
||
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,249 @@ | ||
<!-- Copyright 2024 The HuggingFace Team. All rights reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. --> | ||
|
||
<div style="float: right;"> | ||
<div class="flex flex-wrap space-x-1"> | ||
<a href="https://huggingface.co/docs/diffusers/main/en/tutorials/using_peft_for_inference" target="_blank" rel="noopener"> | ||
<img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/> | ||
</a> | ||
</div> | ||
</div> | ||
|
||
# SkyReels-V2: Infinite-length Film Generative model | ||
|
||
[SkyReels-V2](https://huggingface.co/papers/2504.13074) by the SkyReels Team. | ||
|
||
*Recent advances in video generation have been driven by diffusion models and autoregressive frameworks, yet critical challenges persist in harmonizing prompt adherence, visual quality, motion dynamics, and duration: compromises in motion dynamics to enhance temporal visual quality, constrained video duration (5-10 seconds) to prioritize resolution, and inadequate shot-aware generation stemming from general-purpose MLLMs' inability to interpret cinematic grammar, such as shot composition, actor expressions, and camera motions. These intertwined limitations hinder realistic long-form synthesis and professional film-style generation. To address these limitations, we propose SkyReels-V2, an Infinite-length Film Generative Model, that synergizes Multi-modal Large Language Model (MLLM), Multi-stage Pretraining, Reinforcement Learning, and Diffusion Forcing Framework. Firstly, we design a comprehensive structural representation of video that combines the general descriptions by the Multi-modal LLM and the detailed shot language by sub-expert models. Aided with human annotation, we then train a unified Video Captioner, named SkyCaptioner-V1, to efficiently label the video data. Secondly, we establish progressive-resolution pretraining for the fundamental video generation, followed by a four-stage post-training enhancement: Initial concept-balanced Supervised Fine-Tuning (SFT) improves baseline quality; Motion-specific Reinforcement Learning (RL) training with human-annotated and synthetic distortion data addresses dynamic artifacts; Our diffusion forcing framework with non-decreasing noise schedules enables long-video synthesis in an efficient search space; Final high-quality SFT refines visual fidelity. All the code and models are available at [this https URL](https://github.com/SkyworkAI/SkyReels-V2).* | ||
|
||
You can find all the original SkyReels-V2 checkpoints under the [Skywork](https://huggingface.co/collections/Skywork/skyreels-v2-6801b1b93df627d441d0d0d9) organization. | ||
|
||
The following SkyReels-V2 models are supported in Diffusers: | ||
- [SkyReels-V2 DF 1.3B - 540P](https://huggingface.co/Skywork/SkyReels-V2-DF-1.3B-540P-Diffusers) | ||
- [SkyReels-V2 DF 14B - 540P](https://huggingface.co/Skywork/SkyReels-V2-DF-14B-540P-Diffusers) | ||
- [SkyReels-V2 DF 14B - 720P](https://huggingface.co/Skywork/SkyReels-V2-DF-14B-720P-Diffusers) | ||
- [SkyReels-V2 T2V 14B - 540P](https://huggingface.co/Skywork/SkyReels-V2-T2V-14B-540P-Diffusers) | ||
- [SkyReels-V2 T2V 14B - 720P](https://huggingface.co/Skywork/SkyReels-V2-T2V-14B-720P-Diffusers) | ||
- [SkyReels-V2 I2V 1.3B - 540P](https://huggingface.co/Skywork/SkyReels-V2-I2V-1.3B-540P-Diffusers) | ||
- [SkyReels-V2 I2V 14B - 540P](https://huggingface.co/Skywork/SkyReels-V2-I2V-14B-540P-Diffusers) | ||
- [SkyReels-V2 I2V 14B - 720P](https://huggingface.co/Skywork/SkyReels-V2-I2V-14B-720P-Diffusers) | ||
|
||
> [!TIP] | ||
> Click on the SkyReels-V2 models in the right sidebar for more examples of video generation. | ||
|
||
### Text-to-Video Generation | ||
|
||
The example below demonstrates how to generate a video from text optimized for memory or inference speed. | ||
|
||
<hfoptions id="T2V usage"> | ||
tolgacangoz marked this conversation as resolved.
Show resolved
Hide resolved
|
||
<hfoption id="T2V memory"> | ||
|
||
Refer to the [Reduce memory usage](../../optimization/memory) guide for more details about the various memory saving techniques. | ||
|
||
```py | ||
# pip install ftfy | ||
import torch | ||
import numpy as np | ||
from diffusers import AutoModel, SkyReelsV2DiffusionForcingPipeline | ||
from diffusers.hooks.group_offloading import apply_group_offloading | ||
from diffusers.utils import export_to_video | ||
from transformers import UMT5EncoderModel | ||
|
||
text_encoder = UMT5EncoderModel.from_pretrained("Skywork/SkyReels-V2-DF-14B-540P-Diffusers", subfolder="text_encoder", torch_dtype=torch.bfloat16) | ||
vae = AutoModel.from_pretrained("Skywork/SkyReels-V2-DF-14B-540P-Diffusers", subfolder="vae", torch_dtype=torch.float32) | ||
transformer = AutoModel.from_pretrained("Skywork/SkyReels-V2-DF-14B-540P-Diffusers", subfolder="transformer", torch_dtype=torch.bfloat16) | ||
|
||
# group-offloading | ||
onload_device = torch.device("cuda") | ||
offload_device = torch.device("cpu") | ||
apply_group_offloading(text_encoder, | ||
onload_device=onload_device, | ||
offload_device=offload_device, | ||
offload_type="block_level", | ||
num_blocks_per_group=4 | ||
) | ||
transformer.enable_group_offload( | ||
onload_device=onload_device, | ||
offload_device=offload_device, | ||
offload_type="leaf_level", | ||
use_stream=True | ||
) | ||
|
||
pipeline = SkyReelsV2DiffusionForcingPipeline.from_pretrained( | ||
"Skywork/SkyReels-V2-DF-14B-540P-Diffusers", | ||
vae=vae, | ||
transformer=transformer, | ||
text_encoder=text_encoder, | ||
torch_dtype=torch.bfloat16 | ||
) | ||
pipe = pipe.to("cuda") | ||
pipe.transformer.set_ar_attention(causal_block_size=5) | ||
|
||
prompt = "A cat and a dog baking a cake together in a kitchen. The cat is carefully measuring flour, while the dog is stirring the batter with a wooden spoon. The kitchen is cozy, with sunlight streaming through the window." | ||
|
||
output = pipe( | ||
prompt=prompt, | ||
num_inference_steps=30, | ||
height=544, | ||
width=960, | ||
num_frames=97, | ||
ar_step=5, # Controls asynchronous inference (0 for synchronous mode) | ||
overlap_history=None, # Number of frames to overlap for smooth transitions in long videos; 17 for long | ||
addnoise_condition=20, # Improves consistency in long video generation | ||
).frames[0] | ||
export_to_video(output, "T2V.mp4", fps=24, quality=8) | ||
``` | ||
|
||
</hfoption> | ||
</hfoptions> | ||
|
||
### First-Last-Frame-to-Video Generation | ||
|
||
The example below demonstrates how to use the image-to-video pipeline to generate a video using a text description, a starting frame, and an ending frame. | ||
|
||
<hfoptions id="FLF2V usage"> | ||
<hfoption id="usage"> | ||
|
||
```python | ||
import numpy as np | ||
import torch | ||
import torchvision.transforms.functional as TF | ||
from diffusers import AutoencoderKLWan, SkyReelsV2DiffusionForcingImageToVideoPipeline | ||
from diffusers.utils import export_to_video, load_image | ||
|
||
|
||
model_id = "Skywork/SkyReels-V2-DF-14B-720P-Diffusers" | ||
vae = AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32) | ||
pipe = SkyReelsV2DiffusionForcingImageToVideoPipeline.from_pretrained( | ||
model_id, vae=vae, torch_dtype=torch.bfloat16 | ||
) | ||
pipe.to("cuda") | ||
|
||
first_frame = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/flf2v_input_first_frame.png") | ||
last_frame = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/flf2v_input_last_frame.png") | ||
|
||
def aspect_ratio_resize(image, pipe, max_area=720 * 1280): | ||
aspect_ratio = image.height / image.width | ||
mod_value = pipe.vae_scale_factor_spatial * pipe.transformer.config.patch_size[1] | ||
height = round(np.sqrt(max_area * aspect_ratio)) // mod_value * mod_value | ||
width = round(np.sqrt(max_area / aspect_ratio)) // mod_value * mod_value | ||
image = image.resize((width, height)) | ||
return image, height, width | ||
|
||
def center_crop_resize(image, height, width): | ||
# Calculate resize ratio to match first frame dimensions | ||
resize_ratio = max(width / image.width, height / image.height) | ||
|
||
# Resize the image | ||
width = round(image.width * resize_ratio) | ||
height = round(image.height * resize_ratio) | ||
size = [width, height] | ||
image = TF.center_crop(image, size) | ||
|
||
return image, height, width | ||
|
||
first_frame, height, width = aspect_ratio_resize(first_frame, pipe) | ||
if last_frame.size != first_frame.size: | ||
last_frame, _, _ = center_crop_resize(last_frame, height, width) | ||
|
||
prompt = "CG animation style, a small blue bird takes off from the ground, flapping its wings. The bird's feathers are delicate, with a unique pattern on its chest. The background shows a blue sky with white clouds under bright sunshine. The camera follows the bird upward, capturing its flight and the vastness of the sky from a close-up, low-angle perspective." | ||
|
||
output = pipe( | ||
image=first_frame, last_image=last_frame, prompt=prompt, height=height, width=width, guidance_scale=5.0 | ||
).frames[0] | ||
export_to_video(output, "output.mp4", fps=24) | ||
``` | ||
|
||
</hfoption> | ||
</hfoptions> | ||
|
||
|
||
## Notes | ||
|
||
- SkyReels-V2 supports LoRAs with [`~loaders.SkyReelsV2LoraLoaderMixin.load_lora_weights`]. | ||
|
||
<details> | ||
<summary>Show example code</summary> | ||
|
||
```py | ||
# pip install ftfy | ||
import torch | ||
from diffusers import AutoModel, SkyReelsV2DiffusionForcingPipeline | ||
from diffusers.utils import export_to_video | ||
|
||
vae = AutoModel.from_pretrained( | ||
"Skywork/SkyReels-V2-DF-1.3B-540P-Diffusers", subfolder="vae", torch_dtype=torch.float32 | ||
) | ||
pipeline = SkyReelsV2DiffusionForcingPipeline.from_pretrained( | ||
"Skywork/SkyReels-V2-DF-1.3B-540P-Diffusers", vae=vae, torch_dtype=torch.bfloat16 | ||
) | ||
pipeline.to("cuda") | ||
|
||
pipeline.load_lora_weights("benjamin-paine/steamboat-willie-1.3b", adapter_name="steamboat-willie") | ||
pipeline.set_adapters("steamboat-willie") | ||
|
||
pipeline.enable_model_cpu_offload() | ||
|
||
# use "steamboat willie style" to trigger the LoRA | ||
prompt = """ | ||
steamboat willie style, golden era animation, The camera rushes from far to near in a low-angle shot, | ||
revealing a white ferret on a log. It plays, leaps into the water, and emerges, as the camera zooms in | ||
for a close-up. Water splashes berry bushes nearby, while moss, snow, and leaves blanket the ground. | ||
Birch trees and a light blue sky frame the scene, with ferns in the foreground. Side lighting casts dynamic | ||
shadows and warm highlights. Medium composition, front view, low angle, with depth of field. | ||
""" | ||
|
||
output = pipeline( | ||
prompt=prompt, | ||
num_frames=97, | ||
guidance_scale=6.0, | ||
).frames[0] | ||
export_to_video(output, "output.mp4", fps=24) | ||
``` | ||
|
||
</details> | ||
|
||
|
||
## SkyReelsV2DiffusionForcingPipeline | ||
|
||
[[autodoc]] SkyReelsV2DiffusionForcingPipeline | ||
- all | ||
- __call__ | ||
|
||
## SkyReelsV2DiffusionForcingImageToVideoPipeline | ||
|
||
[[autodoc]] SkyReelsV2DiffusionForcingImageToVideoPipeline | ||
- all | ||
- __call__ | ||
|
||
## SkyReelsV2DiffusionForcingVideoToVideoPipeline | ||
|
||
[[autodoc]] SkyReelsV2DiffusionForcingVideoToVideoPipeline | ||
- all | ||
- __call__ | ||
|
||
## SkyReelsV2Pipeline | ||
|
||
[[autodoc]] SkyReelsV2Pipeline | ||
- all | ||
- __call__ | ||
|
||
## SkyReelsV2ImageToVideoPipeline | ||
|
||
[[autodoc]] SkyReelsV2ImageToVideoPipeline | ||
- all | ||
- __call__ | ||
|
||
## SkyReelsV2PipelineOutput | ||
|
||
[[autodoc]] pipelines.skyreels_v2.pipeline_output.SkyReelsV2PipelineOutput |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
<!--Copyright 2024 The HuggingFace Team. All rights reserved. | ||
|
||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | ||
the License. You may obtain a copy of the License at | ||
|
||
http://www.apache.org/licenses/LICENSE-2.0 | ||
|
||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | ||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations under the License. | ||
--> | ||
|
||
# FlowMatchUniPCMultistepScheduler | ||
|
||
`FlowMatchUniPCMultistepScheduler` is based on the flow-matching sampling introduced in [Stable Diffusion 3](https://huggingface.co/papers/2403.03206). | ||
|
||
## FlowMatchUniPCMultistepScheduler | ||
[[autodoc]] FlowMatchUniPCMultistepScheduler |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @yiyixuxu Do we have contact with the SkyReels team and do we know if they would be okay with hosting the weights? If it's not possible, we could maintain
skyreels-community
org similar to hunyuanThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, let me check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they already made the empty repos here https://huggingface.co/Skywork/SkyReels-V2-I2V-14B-720P-Diffusers