-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[runtime] Spike in memory usage when running VAE ("segmind/SSD-1B", "stabilityai/stable-diffusion-2") #5924
Comments
Here is my memory history as per https://pytorch.org/docs/stable/torch_cuda_memory.html It seems like an issue with how cuDNN allocates memory for conv |
The difference seems to be in the implementation of diffuser's Investigation underway. |
Seems to be due to upsampling. It causes the conv activations to be extremely large (300/600MB) EDIT: Found the culprit - activations of size 256 * 768 * 768 * 2 = 300MB (256 channels for image of size 768 * 768 at fp16 precision) |
The sources of the largest memory spikes are: Not sure why there are 2 allocations. Honestly, it seems that the Also, it's weird to me that the upsample block has 256 channels while the next resblock run has only 128 channels... Maybe relevant references: |
Thanks for the super cool investigation here, @jon-chuang. However, I don't have any concrete suggestions to reduce the memory usage here as the blocks that seem to be causing the spikes are known to be memory-intensive. Does it help to use FP16 precision if not being used already? |
Yep, I was just wondering if there a way to reduce the peak memory allocated e.g. by configuring cuDNN or generated triton code to prefer using less memory. I'm actually already running fp16. See also pytorch/pytorch#31500, pytorch/pytorch#49207 |
Thanks for sharing your findings. I don't really see a concrete action item for us here. But I will keep the issue open in case someone else stumbles upon it. |
Sure, it was unintuitive that the activations could use so much memory as I was used to thinking that model weights dominate the memory cost, but I guess for vision models, this is expected. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Hi, we solve this problem with sequence parallel on multiple devices and chunked input. |
Describe the bug
When running inference, the VAE decoders for SD2 and SSD-1B are only ~1/10th the size of the unet, but cause a GPU memory spike when run (sharp increase of 2GB+), causing OOM on my device
Would be able to help investigate if advice is provided. I think there's somewhere where we can early release some unused memory.
Reproduction
Watch the memory usage when the vae is applied for:
"stabilityai/stable-diffusion-2"
"segmind/SSD-1B"
Logs
System Info
Laptop 4080
torch nightly
diffusers Version: 0.23.1
Who can help?
cc: @sayakpaul @DN6 @yiyixuxu @patrickvonplaten for VAE expertise
The text was updated successfully, but these errors were encountered: