You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to run pretraining with Resnet50 with my data, and running into out-of-memory issues with this.
Initially, I was using two V100s (32 GB) and the maximum batch size I could go to was 256. However, I can't go higher with even larger memory GPUs — I tried using an A100 both 40GB and 80GB, and the maximum batch size I could use without running into out-of-memory issues was still 256.
I'm a bit confused and was wondering if there's a knowledge gap in my understanding; let me know if I'm missing anything!
The text was updated successfully, but these errors were encountered:
hi @knightron0, if a batch size of 256 maxes out a 32GB V100, then a 40GB A100 should be similar.
FYI: we use 32 x 80GB A100 in ResNet50 pretraining, with single batch size 128, and that was ok.
I'm trying to run pretraining with Resnet50 with my data, and running into out-of-memory issues with this.
Initially, I was using two V100s (32 GB) and the maximum batch size I could go to was 256. However, I can't go higher with even larger memory GPUs — I tried using an A100 both 40GB and 80GB, and the maximum batch size I could use without running into out-of-memory issues was still 256.
I'm a bit confused and was wondering if there's a knowledge gap in my understanding; let me know if I'm missing anything!
The text was updated successfully, but these errors were encountered: