When conducting SFT experiments, setting batch_size_train to 1 or 2 has the same memory usage. #27

tiesanguaixia · 2024-04-29T13:34:28Z

Thank you for your excellent paper and open source code. I would like to ask when using 4 * V100 GPU for instruction tuning on the TimeChat model, I set world_size==4 and accum_grad_iters==8 unchanged, but when batch_size_train is set to 1 or 2, the memory usage seems to be the same, all almost filling up the memory of every V100 GPU. What is the reason for this? Thank you a lot!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When conducting SFT experiments, setting batch_size_train to 1 or 2 has the same memory usage. #27

When conducting SFT experiments, setting batch_size_train to 1 or 2 has the same memory usage. #27

tiesanguaixia commented Apr 29, 2024

When conducting SFT experiments, setting batch_size_train to 1 or 2 has the same memory usage. #27

When conducting SFT experiments, setting batch_size_train to 1 or 2 has the same memory usage. #27

Comments

tiesanguaixia commented Apr 29, 2024