-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multiple GPUs do not reduce training time #89
Comments
This is bad. We'll look into it. |
This has long been the case and is basically related to the architecture of It's worth also trying |
Thanks, we'll give this a try, or take a PR @nicolabertoldi if you are interested |
closing, now Multi GPU is implemented. x3 on 4 GPU. |
I am trying to use multiple GPU in training, but I am not able to reduce training time .
I have a machine with 3 GPUs (GeForce GTX 1080), and I train a network (details are below)
I tried with different amount of GPUS (1 or 2 or 3) and different batch_size (64,128,192,248)
Here is the table reporting the times of one epoch
using 1 GPU:
batch_size=64 43 seconds
batch_size=128 35 seconds
batch_size=192 32 seconds
batch_size=248 30 seconds
using 2 GPUs:
batch_size=64 78 seconds
batch_size=128 51 seconds
batch_size=192 43 seconds
batch_size=248 40 seconds
using 3 GPUs:
batch_size=64 94 seconds
batch_size=128 60 seconds
batch_size=192 50 seconds
batch_size=248 44 seconds
I also notice that the GPU utilization is quite low when multiple GPUs are used
with 1 GPU GPU utilization: 80-90%
with 2 GPU GPU utilization: 45-55%
with 3 GPU GPU utilization: 35-45%
I am using this setting (gpus and batch_size vary according to the experiments):
Namespace(batch_size=128, brnn=False, brnn_merge='concat', context_gate=None, curriculum=False, data='debugging/model.train.pt', dropout=0.3, encoder_type='text', epochs=13, extra_shuffle=False, gpus=[0], input_feed=1, layers=2, learning_rate=1.0, learning_rate_decay=0.5, log_interval=50, max_generator_batches=32, max_grad_norm=5, optim='sgd', param_init=0.1, pre_word_vecs_dec=None, pre_word_vecs_enc=None, rnn_size=500, rnn_type='LSTM', save_model='debugging/model', seed=-1, start_decay_at=8, start_epoch=1, train_from='', train_from_state_dict='', word_vec_size=500)
and this commit 58c8b52
Why training speed does not scale with the number of GPUs?
But rather it seems to slow down the training.
Have you already noticed this behavior?
Am I doing any error?
Any comment is welcome.
The text was updated successfully, but these errors were encountered: