Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-gpu out of memory in latents decoding process. #107

Open
maflx opened this issue Dec 4, 2024 · 2 comments
Open

Multi-gpu out of memory in latents decoding process. #107

maflx opened this issue Dec 4, 2024 · 2 comments

Comments

@maflx
Copy link

maflx commented Dec 4, 2024

Running with 4xA30 GPUs with 24G each one, while performing the inference I reached out of memory during decoding latents.
Multi-gpu does not include tiled decoding but I think it should so that less VRAM is needed.

I have hardcoded the following:

frames = decode_latents(ctx.decoder, latents)

to:

frames = decode_latents_tiled_spatial(
                        ctx.decoder, latents,
                        num_tiles_w=8, num_tiles_h=4,
                        overlap=4)

And doing it I've been able to generate the video.

@ved-genmo
Copy link
Contributor

I suppose what we should really do is calculate the total amount of VRAM, and then use that to figure out whether or not we should enable tiled decoding.

@xsolo
Copy link

xsolo commented Jan 9, 2025

Same here, running on 2xA100 with 40GB vram each.
Now I got another issue:

torch.distributed.DistBackendError: NCCL error in: ../torch/csrc/distributed/c10d/NCCLUtils.hpp:317, unhandled cuda error (run with NCCL_DEBUG=INFO for details), NCCL version 2.21.5
ncclUnhandledCudaError: Call to CUDA function failed.
Last error:
Cuda failure 2 'out of memory'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants