You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for your excellent paper and open source code. I would like to ask when using 4 * V100 GPU for instruction tuning on the TimeChat model, I set world_size==4 and accum_grad_iters==8 unchanged, but when batch_size_train is set to 1 or 2, the memory usage seems to be the same, all almost filling up the memory of every V100 GPU. What is the reason for this? Thank you a lot!
The text was updated successfully, but these errors were encountered:
Thank you for your excellent paper and open source code. I would like to ask when using 4 * V100 GPU for instruction tuning on the TimeChat model, I set
world_size==4
andaccum_grad_iters==8
unchanged, but whenbatch_size_train
is set to 1 or 2, the memory usage seems to be the same, all almost filling up the memory of every V100 GPU. What is the reason for this? Thank you a lot!The text was updated successfully, but these errors were encountered: