Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ElemwiseGradCompute is non-deterministic #10122

Closed
panyx0718 opened this issue Apr 22, 2018 · 1 comment
Closed

ElemwiseGradCompute is non-deterministic #10122

panyx0718 opened this issue Apr 22, 2018 · 1 comment
Assignees

Comments

@panyx0718
Copy link
Contributor

panyx0718 commented Apr 22, 2018

ElemwiseGradCompute

With the same inputs, dy can be different at different times.

x: 3398586.250000
y: 0.000000
out: 3398586.250000
E0422 12:33:49.270020 35231 elementwise_op_function.h:524] 2, 32, 220, 220 32
dout: 138643.281250
dx: 138643.281250
dy: 246.410400

x: 3398586.250000
y: 0.000000
out: 3398586.250000
E0422 12:33:52.696183 35231 elementwise_op_function.h:524] 2, 32, 220, 220 32
dout: 138643.281250
dx: 138643.281250
dy: 264.202179

reproduce
https://github.com/panyx0718/Paddle/tree/qingqing01-parallel_do_and_exe_compare2
CUDA_VISIBLE_DEVICES=3 ctest -R test_parallel_executor_grad

@panyx0718
Copy link
Contributor Author

panyx0718 commented Apr 22, 2018

I noticed this when I run the same model twice with the same input and seed, but get different results.

I leads to different converge path for relatively deep models, such as se-resnext.

Not sure why it happens. A guess is: if the float aggregation (e.g. sum(tensor)) is in random order, the result could be different at different times due to precision loss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants