Multi-gpu out of memory in latents decoding process. #107

maflx · 2024-12-04T11:39:22Z

Running with 4xA30 GPUs with 24G each one, while performing the inference I reached out of memory during decoding latents.
Multi-gpu does not include tiled decoding but I think it should so that less VRAM is needed.

I have hardcoded the following:

mochi/src/genmo/mochi_preview/pipelines.py

Line 677 in f64ffe6

frames = decode_latents(ctx.decoder, latents)

to:

frames = decode_latents_tiled_spatial(
                        ctx.decoder, latents,
                        num_tiles_w=8, num_tiles_h=4,
                        overlap=4)

And doing it I've been able to generate the video.

The text was updated successfully, but these errors were encountered:

ved-genmo · 2024-12-24T22:35:50Z

I suppose what we should really do is calculate the total amount of VRAM, and then use that to figure out whether or not we should enable tiled decoding.

xsolo · 2025-01-09T15:45:56Z

Same here, running on 2xA100 with 40GB vram each.
Now I got another issue:

torch.distributed.DistBackendError: NCCL error in: ../torch/csrc/distributed/c10d/NCCLUtils.hpp:317, unhandled cuda error (run with NCCL_DEBUG=INFO for details), NCCL version 2.21.5
ncclUnhandledCudaError: Call to CUDA function failed.
Last error:
Cuda failure 2 'out of memory'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-gpu out of memory in latents decoding process. #107

Multi-gpu out of memory in latents decoding process. #107

maflx commented Dec 4, 2024 •

edited

Loading

ved-genmo commented Dec 24, 2024

xsolo commented Jan 9, 2025 •

edited

Loading

Multi-gpu out of memory in latents decoding process. #107

Multi-gpu out of memory in latents decoding process. #107

Comments

maflx commented Dec 4, 2024 • edited Loading

ved-genmo commented Dec 24, 2024

xsolo commented Jan 9, 2025 • edited Loading

maflx commented Dec 4, 2024 •

edited

Loading

xsolo commented Jan 9, 2025 •

edited

Loading