Skip to content
This repository has been archived by the owner on Oct 31, 2022. It is now read-only.

What is the minimum size of GPU I need to set batch_size more than 1 to train 345M model using train.py? #52

Open
shamiul94 opened this issue Jun 19, 2020 · 0 comments

Comments

@shamiul94
Copy link

shamiul94 commented Jun 19, 2020

  1. I am using ml.p3.2xlarge instance on AWS with one 16 GB V100 GPU and tried to train 345 model with batch_size 2 and it gets OOM error. It works for batch_size 1 though.
  2. I am thinking of using batch_size 2 to 8. What size of GPU do I need to make this happen? If anyone has experienced this situation, sharing it would be helpful.
  3. I am using this command to train it.
python train.py --dataset Dataset/data.npz --sample_every 10 --sample_num 3 --batch_size 1 --learning_rate 0.0001 --model_name 345M
@shamiul94 shamiul94 changed the title What is the minimum size of GPU I need to set batch_size more than 1 to train 345M model using train.py? What is the minimum size of GPU I need to set batch_size more than 1 to train 345M model using train.py? Jun 19, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant