-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-GPU Inference Support or Video Splitting for Long Video Processing #6
Comments
I am working on processing arbitrary long videos. The update will be released in two days. |
Hi, please check the results here: #8 |
Sure, Check this asap, thks! |
is any ckpt changed that I found need to load |
The ckpts are the same as previous ones. laion2b_s32b_b79k model is: https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K/resolve/main/open_clip_pytorch_model.bin |
SO slow, is it normal ,running in single A100 |
So sad it is normal. It makes senses because you are processing high-resolution and high-frame-rate videos. |
how to configure, did not saw it in readme and btw, It’s absolutely necessary to set the prompt to be the same as the one used to generate the video in CogVideoX, right? |
The Multiple gpu inference is not supported right now, but we are working on it. |
Oh, that’s an issue because CogVideoX supports long text, typically exceeding 77 words, usually around 150-220 words. I’d like to know how to reproduce your rendered video. How should the prompt be written, given that the original video prompt is longer than 77 words? |
I only adopt the first sentence. For example: The camera follows behinds a white vintage SUV with a black roof rack as it speeds up a steep dirt road surrounded by pine trees on a steep mountain slope. The results that I present in README is not processed by the released VEnhancer checkpoint. The released VEnhancer has powerful generative ability and is more suitable for lower-quality and lower-resolution AIGC videos. But CogVideoX can already produce good videos, so I use another checkpoint for just enhancing temporal consistency and removing unpleasing textures. |
So, with the released version, it’s possible to reproduce the results if only use the first sentence of the prompt? |
The released ckpt; up_scale=3; noise_aug=200; target_fps=24, prompt="A detailed wooden toy ship with intricately carved masts and sails is seen gliding smoothly over a plush, blue carpet that mimics the waves of the sea" It will produce results like this: A.detailed.wooden.toy.ship.with.intricately.carved.masts.and.sails.is.seen.gliding.smoothly.over.a.plush.blue.carpet.mp4If you are happy with this, you can use the above parameters. |
Is this guide work(I tested it and work for me)? If OK I will push it |
-up_scale is recommend to be set to 3,4, or 2 if the resolution of input video is already high. The target resolution is limited to be around 2k and below. These are my comments. Thanks for your work! |
We are working with videos that range from 6 to 10 seconds in length, which obviously leads to Out Of Memory (OOM) errors during processing. We have access to high-performance hardware, such as multiple A100 GPUs.
The text was updated successfully, but these errors were encountered: