Broken paralelization #16

Svito-zar · 2019-06-06T12:41:33Z

When I try to run the model on several GPUs I am getting a numerical error:

Warning: NaN or Inf found in input tensor.
Warning: NaN or Inf found in input tensor.
Warning: NaN or Inf found in input tensor.

While running on a single GPU everything works just fine.

That indicated that there is an issue with parallelization

The text was updated successfully, but these errors were encountered:

zain-ul-abedien · 2019-07-19T09:16:19Z

Hey Svito-zar i am training on a single gpu but it shows the same warning( Warning: NaN or Inf found in input tensor). Please guide me how should i solve this problem.

Svito-zar · 2019-07-22T07:13:54Z

I didn't have your problem and I don't know how to fix it either.
Would be interested to know the solution as well.

What I find weird from the Machine Learning perspective is that your batch_size is very small. It causes gradient to vary a lot and that might lead to numerical instabilities. So I would try much larger barch_sizes. At least 20. Better 50.

hologerry · 2019-09-03T14:12:10Z

I have the same problem with large batch_size 64, have you guys found the solution?
Help, please.

pptrick · 2021-02-25T12:12:49Z

I found some problems with parallelization too. When I try to run the model on more than one GPU, the process just freeze on the forward stage, namely this line in trainer.py:
z, nll, y_logits = self.graph(x=x, y_onehot=y_onehot)
The program is still running but I can't see any output after this line. However, One GPU works fine.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Broken paralelization #16

Broken paralelization #16

Svito-zar commented Jun 6, 2019 •

edited

Loading

zain-ul-abedien commented Jul 19, 2019

Svito-zar commented Jul 22, 2019

hologerry commented Sep 3, 2019

pptrick commented Feb 25, 2021 •

edited

Loading

Broken paralelization #16

Broken paralelization #16

Comments

Svito-zar commented Jun 6, 2019 • edited Loading

zain-ul-abedien commented Jul 19, 2019

Svito-zar commented Jul 22, 2019

hologerry commented Sep 3, 2019

pptrick commented Feb 25, 2021 • edited Loading

Svito-zar commented Jun 6, 2019 •

edited

Loading

pptrick commented Feb 25, 2021 •

edited

Loading