Below are example videos showcasing the enhanced video quality achieved through STG:
hunyuan1.mp4
hunyuan2.mp4
mochi1.mp4
mochi2.mp4
cogvideox1.mp4
cogvideox2.mp4
ltxvideo1.mp4
svd1.mp4
-
🍡Mochi
-
For installation and requirements, refer to the official repository.
-
Update
demos/config.py
with your desired settings and simply run:python ./demos/cli.py
-
-
🌌HunyuanVideo
- For installation and requirements, refer to the official repository.
Using CFG (Default Model):
torchrun --nproc_per_node=4 sample_video.py \ --video-size 544 960 \ --video-length 65 \ --infer-steps 50 \ --prompt "A time traveler steps out of a glowing portal into a Victorian-era street filled with horse-drawn carriages, realistic style." \ --flow-reverse \ --seed 42 \ --ulysses-degree 4 \ --ring-degree 1 \ --save-path ./results
To utilize STG, use the following command:
torchrun --nproc_per_node=4 sample_video.py \ --video-size 544 960 \ --video-length 65 \ --infer-steps 50 \ --prompt "A time traveler steps out of a glowing portal into a Victorian-era street filled with horse-drawn carriages, realistic style." \ --flow-reverse \ --seed 42 \ --ulysses-degree 4 \ --ring-degree 1 \ --save-path ./results \ --stg-mode "STG-R" \ --stg-block-idx 2 \ --stg-scale 2.0
Key Parameters:
- stg_mode: Only STG-R supported.
- stg_scale: 2.0 is recommended.
- stg_block_idx: Specify the block index for applying STG.
-
🏎️LTX-Video
- For installation and requirements, refer to the official repository.
Using CFG (Default Model):
python inference.py --ckpt_dir './weights' --prompt "A man ..."
To utilize STG, use the following command:
python inference.py --ckpt_dir './weights' --prompt "A man ..." --stg_mode stg-a --stg_scale 1.0 --stg_block_idx 19 --do_rescaling True
Key Parameters:
- stg_mode: Choose between stg-a or stg-r.
- stg_scale: Recommended values are ≤2.0.
- stg_block_idx: Specify the block index for applying STG.
- do_rescaling: Set to True to enable rescaling.
-
🧪Diffusers
The Diffusers implementation supports Mochi, HunyuanVideo,CogVideoX and SVD as of now
To run the test script, refer to the
test.py
file in each folder. Below is an example using Mochi:# test.py import torch from pipeline_stg_mochi import MochiSTGPipeline from diffusers.utils import export_to_video import os # Load the pipeline pipe = MochiSTGPipeline.from_pretrained("genmo/mochi-1-preview", variant="bf16", torch_dtype=torch.bfloat16) pipe.enable_vae_tiling() pipe = pipe.to("cuda") #--------Option--------# prompt = "A slow-motion capture of a beautiful woman in a flowing dress spinning in a field of sunflowers, with petals swirling around her, realistic style." stg_mode = "STG-R" stg_applied_layers_idx = [35] stg_scale = 0.8 # 0.0 for CFG (default) do_rescaling = True # False (default) #----------------------# # Generate video frames frames = pipe( prompt, num_frames=84, stg_mode=stg_mode, stg_applied_layers_idx=stg_applied_layers_idx, stg_scale=stg_scale, do_rescaling=do_rescaling ).frames[0] ...
For details on memory efficiency, inference acceleration, and more, refer to the original pages below:
- Implement STG on diffusers
- Update STG with Open-Sora, SVD
This project is built upon the following works: