-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU results are non-deterministic #11057
Comments
I think there is probably some issue with the "embedding" layer (and/or the "lookup_table_op"). Even this sentiment analysis code (https://github.com/sidgoyal78/Paddle/blob/a801e7bcb2f4f1e131f6a640ecd84a03d21588ff/test_sa_conv.py) produces inconsistent results when run with the same seed on GPU. But if the same code is run on CPU, then it yields consistent results. |
#10405 (related issue) |
@daming-lu @sidgoyal78 |
This bug has been located. If the kernel using |
That's also convinced by our experiment on the benchmark ops work.#10646 |
Here is siddharth's reproduce PR. #11133 |
@dzhwinter @chengduoZH : Thanks for the updates! One question: do we have a plan to fix it? As you know, Baidu is a major contributor to MLPerf and we want to get performance metrics for our own PaddlePaddle framework 😀 |
您好,此issue在近一个月内暂无更新,我们将于今天内关闭。若在关闭后您仍需跟进提问,可重新开启此问题,我们将在24小时内回复您。因关闭带来的不便我们深表歉意,请您谅解~感谢您对PaddlePaddle的支持! |
Since you haven't replied for more than a year, we have closed this issue/pr. |
When we stabilize all the randomness and run the same training twice on GPU (same GPU core), the results are different by a tiny precision. This is NOT happening on CPU.
See the attachments. The demo code is in this PR
w2v_nocuda_t1.txt
w2v_nocuda_t2.txt
w2v_t1_cuda.txt
w2v_t2_cuda.txt
The text was updated successfully, but these errors were encountered: