-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Largest batch size for stage3 training #11
Comments
|
Hi! 64 seems quite low as batchsize. We use 1024 for ViT-L and 512 for the larger models. If I'd have to guess, something like 256 would probably also work, but I think going even lower could impair performance quite a bit. If memory is a limit, you can freeze the first couple of blocks and use less ID heads which only degrade performance slightly (e.g. 4 heads instead of 8, Table 1a in the paper; freezing 12 blocks, Table 8 in the paper). With these two changes you can then max out the available GPU memory (you should be able to fit around batchsize 100 into 80GB without freezing/less heads). With these changes you should be able to fit a batchsize of around 256 in 80GB GPU memory which should be fine for training. We also freeze the first 6 blocks for ViT-2B because of memory issues. Simply add a If none of these suggestions suffice, you can reduce the number of local crops to like 8 or 6 (its 10 by default). This can be done like this:
Some additional techniques to limit memory consumption (that are not implemented in this codebase) would be apply light masking (e.g. discard 25% of the image patches in the 2 global views) or gradient checkpointing. My approach would be to freeze 6 layers, use 4 heads and try to train with that batchsize (I'd guess ~150). My second try would be to freeze 12 layers, use 4 heads, 6 local crops and the maximum possible batchsize with that (I'd guess ~250). |
Thanks a lot for your suggestions. They are very helpful! |
Hi! I am trying to set up stage 3 MIM-Refiner training with images of size (3, 224, 224) and found that I have to reduce the batch size to 64 to avoid OOM errors. I am using an A100 with 80 GB of memory. Is this an expected batch size?
I am asking because I saw that the default value for the batch size in stage 3 is 1024, which is also the value provided in the MIM-Refiner paper. So I thought it would be better to double-check in case I set up any configs wrong.
Thanks!
The text was updated successfully, but these errors were encountered: