Increasing batch size #82

knightron0 · 2024-02-06T06:55:08Z

I'm trying to run pretraining with Resnet50 with my data, and running into out-of-memory issues with this.

Initially, I was using two V100s (32 GB) and the maximum batch size I could go to was 256. However, I can't go higher with even larger memory GPUs — I tried using an A100 both 40GB and 80GB, and the maximum batch size I could use without running into out-of-memory issues was still 256.

I'm a bit confused and was wondering if there's a knowledge gap in my understanding; let me know if I'm missing anything!

keyu-tian · 2024-03-11T15:36:43Z

hi @knightron0, if a batch size of 256 maxes out a 32GB V100, then a 40GB A100 should be similar.
FYI: we use 32 x 80GB A100 in ResNet50 pretraining, with single batch size 128, and that was ok.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increasing batch size #82

Increasing batch size #82

knightron0 commented Feb 6, 2024

keyu-tian commented Mar 11, 2024

Increasing batch size #82

Increasing batch size #82

Comments

knightron0 commented Feb 6, 2024

keyu-tian commented Mar 11, 2024