Multi-GPU Inference Support or Video Splitting for Long Video Processing #6

zRzRzRzRzRzRzR · 2024-08-18T07:56:29Z

We are working with videos that range from 6 to 10 seconds in length, which obviously leads to Out Of Memory (OOM) errors during processing. We have access to high-performance hardware, such as multiple A100 GPUs.

Is there a way to implement multi-GPU inference to handle these longer videos? If so, could you provide guidance on how to set it up?
If multi-GPU inference is not supported, is there a method to split the video into smaller segments for processing? We are concerned that splitting the video might degrade the final output quality. Could you suggest the best practices to minimize quality loss in this scenario?

hejingwenhejingwen · 2024-08-18T09:56:55Z

I am working on processing arbitrary long videos. The update will be released in two days.

hejingwenhejingwen · 2024-08-19T14:57:57Z

Hi, please check the results here: #8

zRzRzRzRzRzRzR · 2024-08-20T04:07:18Z

Sure, Check this asap, thks!

zRzRzRzRzRzRzR · 2024-08-20T04:52:19Z

is any ckpt changed that I found need to load
laion2b_s32b_b79k model

hejingwenhejingwen · 2024-08-20T04:57:57Z

The ckpts are the same as previous ones. laion2b_s32b_b79k model is: https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K/resolve/main/open_clip_pytorch_model.bin

zRzRzRzRzRzRzR · 2024-08-20T06:12:11Z

/share/home/zyx/.conda/envs/cogvideox/lib/python3.10/site-packages/xformers/ops/fmha/flash.py:211: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
  @torch.library.impl_abstract("xformers_flash::flash_fwd")
/share/home/zyx/.conda/envs/cogvideox/lib/python3.10/site-packages/xformers/ops/fmha/flash.py:344: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
  @torch.library.impl_abstract("xformers_flash::flash_bwd")
2024-08-20 13:25:17,553 - video_to_video - INFO - checkpoint_path: ./ckpts/venhancer_paper.pt
/share/home/zyx/.conda/envs/cogvideox/lib/python3.10/site-packages/open_clip/factory.py:88: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  checkpoint = torch.load(checkpoint_path, map_location=map_location)
2024-08-20 13:25:37,486 - video_to_video - INFO - Build encoder with FrozenOpenCLIPEmbedder
/share/home/zyx/Code/VEnhancer/video_to_video/video_to_video_model.py:35: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  load_dict = torch.load(cfg.model_path, map_location='cpu')
2024-08-20 13:25:55,391 - video_to_video - INFO - Load model path ./ckpts/venhancer_paper.pt, with local status <All keys matched successfully>
2024-08-20 13:25:55,392 - video_to_video - INFO - Build diffusion with GaussianDiffusion
2024-08-20 13:26:16,092 - video_to_video - INFO - input video path: inputs/000000.mp4
2024-08-20 13:26:16,093 - video_to_video - INFO - text: Wide-angle aerial shot at dawn,soft morning light casting long shadows,an elderly man walking his dog through a quiet,foggy park,trees and benches in the background,peaceful and serene atmosphere
2024-08-20 13:26:16,156 - video_to_video - INFO - input frames length: 49
2024-08-20 13:26:16,156 - video_to_video - INFO - input fps: 8.0
2024-08-20 13:26:16,156 - video_to_video - INFO - target_fps: 24.0
2024-08-20 13:26:16,311 - video_to_video - INFO - input resolution: (480, 720)
2024-08-20 13:26:16,312 - video_to_video - INFO - target resolution: (1320, 1982)
2024-08-20 13:26:16,312 - video_to_video - INFO - noise augmentation: 250
2024-08-20 13:26:16,312 - video_to_video - INFO - scale s is set to: 8
2024-08-20 13:26:16,399 - video_to_video - INFO - video_data shape: torch.Size([145, 3, 1320, 1982])
/share/home/zyx/Code/VEnhancer/video_to_video/video_to_video_model.py:108: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with amp.autocast(enabled=True):
2024-08-20 13:27:19,605 - video_to_video - INFO - step: 0
2024-08-20 13:30:12,020 - video_to_video - INFO - step: 1
2024-08-20 13:33:04,956 - video_to_video - INFO - step: 2
2024-08-20 13:35:58,691 - video_to_video - INFO - step: 3
2024-08-20 13:38:51,254 - video_to_video - INFO - step: 4
2024-08-20 13:41:44,150 - video_to_video - INFO - step: 5
2024-08-20 13:44:37,017 - video_to_video - INFO - step: 6
2024-08-20 13:47:30,037 - video_to_video - INFO - step: 7
2024-08-20 13:50:22,838 - video_to_video - INFO - step: 8
2024-08-20 13:53:15,844 - video_to_video - INFO - step: 9
2024-08-20 13:56:08,657 - video_to_video - INFO - step: 10
2024-08-20 13:59:01,648 - video_to_video - INFO - step: 11
2024-08-20 14:01:54,541 - video_to_video - INFO - step: 12
2024-08-20 14:04:47,488 - video_to_video - INFO - step: 13
2024-08-20 14:10:13,637 - video_to_video - INFO - sampling, finished.

SO slow, is it normal ,running in single A100

hejingwenhejingwen · 2024-08-20T06:20:20Z

So sad it is normal. It makes senses because you are processing high-resolution and high-frame-rate videos.
Multiple gpu inference may help, but don't expect too much:(

zRzRzRzRzRzRzR · 2024-08-20T06:26:36Z

So sad it is normal. It makes senses because you are processing high-resolution and high-frame-rate videos. Multiple gpu inference may help, but don't expect too much:(

how to configure, did not saw it in readme and btw, It’s absolutely necessary to set the prompt to be the same as the one used to generate the video in CogVideoX, right?

hejingwenhejingwen · 2024-08-20T06:37:48Z

The Multiple gpu inference is not supported right now, but we are working on it.
VEnhancer is trained with short captions mostly, not sure it can understand long captions. It may generate unpleasing textures(not sure) if you provide too many words. More importantly, the max words is 77 in our used clip.

zRzRzRzRzRzRzR · 2024-08-20T06:43:16Z

Oh, that’s an issue because CogVideoX supports long text, typically exceeding 77 words, usually around 150-220 words.

I’d like to know how to reproduce your rendered video. How should the prompt be written, given that the original video prompt is longer than 77 words?

hejingwenhejingwen · 2024-08-20T06:52:42Z

I only adopt the first sentence. For example: The camera follows behinds a white vintage SUV with a black roof rack as it speeds up a steep dirt road surrounded by pine trees on a steep mountain slope.

The results that I present in README is not processed by the released VEnhancer checkpoint. The released VEnhancer has powerful generative ability and is more suitable for lower-quality and lower-resolution AIGC videos. But CogVideoX can already produce good videos, so I use another checkpoint for just enhancing temporal consistency and removing unpleasing textures.

zRzRzRzRzRzRzR · 2024-08-20T06:56:04Z

So, with the released version, it’s possible to reproduce the results if only use the first sentence of the prompt?
I’m currently writing the quick start guide for this and preparing to post it in the CogVideoX community. I need to confirm this issue :)

hejingwenhejingwen · 2024-08-20T07:01:14Z

The released ckpt; up_scale=3; noise_aug=200; target_fps=24, prompt="A detailed wooden toy ship with intricately carved masts and sails is seen gliding smoothly over a plush, blue carpet that mimics the waves of the sea"

It will produce results like this:

A.detailed.wooden.toy.ship.with.intricately.carved.masts.and.sails.is.seen.gliding.smoothly.over.a.plush.blue.carpet.mp4

If you are happy with this, you can use the above parameters.
Actually, up_scale can be set to 2, if you cannot wait. But the quality will degrade. Besides, fps>=16 is already very smooth, so you can also decline the target_fps to 16. noise_aug controls the refinement strength, it depends on users' preference.

zRzRzRzRzRzRzR · 2024-08-20T07:47:56Z

https://github.com/THUDM/CogVideo/pull/143/files#diff-9e657cda0980a4aee4b86550d3640347df4f55f3ac3a827132471681fdc7f52c

Is this guide work(I tested it and work for me)? If OK I will push it

hejingwenhejingwen · 2024-08-20T08:07:58Z

https://github.com/THUDM/CogVideo/pull/143/files#diff-9e657cda0980a4aee4b86550d3640347df4f55f3ac3a827132471681fdc7f52c

Is this guide work(I tested it and work for me)? If OK I will push it

-up_scale is recommend to be set to 3,4, or 2 if the resolution of input video is already high. The target resolution is limited to be around 2k and below.
-noise_aug value depends on the input video quality. Lower quality needs higher noise levels, which corresponds to stronger refinement. 250~300 is for very low-quality videos. good videos: <= 200.
-if you want fewer steps, please change solver_mode to "normal" first, then decline the number of steps. "fast" solver_mode has fixed steps (15).

These are my comments. Thanks for your work!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-GPU Inference Support or Video Splitting for Long Video Processing #6

Multi-GPU Inference Support or Video Splitting for Long Video Processing #6

zRzRzRzRzRzRzR commented Aug 18, 2024

hejingwenhejingwen commented Aug 18, 2024

hejingwenhejingwen commented Aug 19, 2024

zRzRzRzRzRzRzR commented Aug 20, 2024

zRzRzRzRzRzRzR commented Aug 20, 2024

hejingwenhejingwen commented Aug 20, 2024

zRzRzRzRzRzRzR commented Aug 20, 2024

hejingwenhejingwen commented Aug 20, 2024

zRzRzRzRzRzRzR commented Aug 20, 2024 •

edited

Loading

hejingwenhejingwen commented Aug 20, 2024

zRzRzRzRzRzRzR commented Aug 20, 2024

hejingwenhejingwen commented Aug 20, 2024

zRzRzRzRzRzRzR commented Aug 20, 2024

hejingwenhejingwen commented Aug 20, 2024 •

edited

Loading

zRzRzRzRzRzRzR commented Aug 20, 2024

hejingwenhejingwen commented Aug 20, 2024 •

edited

Loading

Multi-GPU Inference Support or Video Splitting for Long Video Processing #6

Multi-GPU Inference Support or Video Splitting for Long Video Processing #6

Comments

zRzRzRzRzRzRzR commented Aug 18, 2024

hejingwenhejingwen commented Aug 18, 2024

hejingwenhejingwen commented Aug 19, 2024

zRzRzRzRzRzRzR commented Aug 20, 2024

zRzRzRzRzRzRzR commented Aug 20, 2024

hejingwenhejingwen commented Aug 20, 2024

zRzRzRzRzRzRzR commented Aug 20, 2024

hejingwenhejingwen commented Aug 20, 2024

zRzRzRzRzRzRzR commented Aug 20, 2024 • edited Loading

hejingwenhejingwen commented Aug 20, 2024

zRzRzRzRzRzRzR commented Aug 20, 2024

hejingwenhejingwen commented Aug 20, 2024

zRzRzRzRzRzRzR commented Aug 20, 2024

hejingwenhejingwen commented Aug 20, 2024 • edited Loading

zRzRzRzRzRzRzR commented Aug 20, 2024

hejingwenhejingwen commented Aug 20, 2024 • edited Loading

zRzRzRzRzRzRzR commented Aug 20, 2024 •

edited

Loading

hejingwenhejingwen commented Aug 20, 2024 •

edited

Loading

hejingwenhejingwen commented Aug 20, 2024 •

edited

Loading