GPU results are non-deterministic #11057

daming-lu · 2018-05-30T22:17:20Z

When we stabilize all the randomness and run the same training twice on GPU (same GPU core), the results are different by a tiny precision. This is NOT happening on CPU.

See the attachments. The demo code is in this PR

vimdiff w2v_nocuda_t1.txt w2v_nocuda_t2.txt  // no difference

vimdiff w2v_t1_cuda.txt w2v_t2_cuda.txt  // has some tiny difference

w2v_nocuda_t1.txt
w2v_nocuda_t2.txt
w2v_t1_cuda.txt
w2v_t2_cuda.txt

sidgoyal78 · 2018-05-30T22:21:20Z

I think there is probably some issue with the "embedding" layer (and/or the "lookup_table_op"). Even this sentiment analysis code (https://github.com/sidgoyal78/Paddle/blob/a801e7bcb2f4f1e131f6a640ecd84a03d21588ff/test_sa_conv.py) produces inconsistent results when run with the same seed on GPU.

But if the same code is run on CPU, then it yields consistent results.

daming-lu · 2018-05-30T23:05:08Z

#10405 (related issue)

chengduoZH · 2018-05-31T02:31:09Z

@daming-lu @sidgoyal78
We have noticed this phenomenon, and we have found that some operation's result on GPU is non-determinism, such as cross_entropy and some operations of cudnn.
Other frameworks also have this similar issue, such as TF(tensorflow/tensorflow#2732), Pytorch(soumith/cudnn.torch#270). I saw the same question in Nvidia forums too.

dzhwinter · 2018-06-04T02:31:05Z

This bug has been located. If the kernel using CudaAtomicAdd, caused by the non-associated of floating number algebra, we will get the non-deterministic result.

dzhwinter · 2018-06-04T02:31:58Z

That's also convinced by our experiment on the benchmark ops work.#10646

dzhwinter · 2018-06-04T02:37:22Z

Here is siddharth's reproduce PR. #11133

daming-lu · 2018-06-25T18:51:32Z

@dzhwinter @chengduoZH : Thanks for the updates! One question: do we have a plan to fix it? As you know, Baidu is a major contributor to MLPerf and we want to get performance metrics for our own PaddlePaddle framework 😀

shanyi15 · 2018-08-15T10:24:33Z

您好，此issue在近一个月内暂无更新，我们将于今天内关闭。若在关闭后您仍需跟进提问，可重新开启此问题，我们将在24小时内回复您。因关闭带来的不便我们深表歉意，请您谅解~感谢您对PaddlePaddle的支持!
Hello, this issue has not been updated in the past month. We will close it today for the sake of other user‘s experience. If you still need to follow up on this question after closing, please feel free to reopen it. In that case, we will get back to you within 24 hours. We apologize for the inconvenience caused by the closure and thank you so much for your support of PaddlePaddle Group!

paddle-bot-old · 2020-05-22T08:14:34Z

Since you haven't replied for more than a year, we have closed this issue/pr.
If the problem is not solved or there is a follow-up one, please reopen it at any time and we will continue to follow up.
由于您超过一年未回复，我们将关闭这个issue/pr。
若问题未解决或有后续问题，请随时重新打开，我们会继续跟进。

daming-lu mentioned this issue May 30, 2018

[DO NOT MERGE] Show bug only #11058

Closed

daming-lu assigned qingqing01 and chengduoZH May 30, 2018

wangkuiyi added the Bug label May 30, 2018

sidgoyal78 mentioned this issue Jun 3, 2018

[Do not merge] Fix nondeterminism issue wrt sentiment analysis chapter #11133

Closed

emailweixu changed the title ~~GPU results are not consistent~~ GPU results are non-deterministic Jun 7, 2018

shanyi15 closed this as completed Aug 15, 2018

lucywsq reopened this Jan 4, 2019

paddle-bot-old bot closed this as completed May 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU results are non-deterministic #11057

GPU results are non-deterministic #11057

daming-lu commented May 30, 2018 •

edited

Loading

sidgoyal78 commented May 30, 2018 •

edited

Loading

daming-lu commented May 30, 2018 •

edited by sidgoyal78

Loading

chengduoZH commented May 31, 2018 •

edited

Loading

dzhwinter commented Jun 4, 2018

dzhwinter commented Jun 4, 2018

dzhwinter commented Jun 4, 2018

daming-lu commented Jun 25, 2018

shanyi15 commented Aug 15, 2018

paddle-bot-old bot commented May 22, 2020

GPU results are non-deterministic #11057

GPU results are non-deterministic #11057

Comments

daming-lu commented May 30, 2018 • edited Loading

sidgoyal78 commented May 30, 2018 • edited Loading

daming-lu commented May 30, 2018 • edited by sidgoyal78 Loading

chengduoZH commented May 31, 2018 • edited Loading

dzhwinter commented Jun 4, 2018

dzhwinter commented Jun 4, 2018

dzhwinter commented Jun 4, 2018

daming-lu commented Jun 25, 2018

shanyi15 commented Aug 15, 2018

paddle-bot-old bot commented May 22, 2020

daming-lu commented May 30, 2018 •

edited

Loading

sidgoyal78 commented May 30, 2018 •

edited

Loading

daming-lu commented May 30, 2018 •

edited by sidgoyal78

Loading

chengduoZH commented May 31, 2018 •

edited

Loading