-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ElemwiseGradCompute is non-deterministic #10122
Comments
I noticed this when I run the same model twice with the same input and seed, but get different results. I leads to different converge path for relatively deep models, such as se-resnext. Not sure why it happens. A guess is: if the float aggregation (e.g. sum(tensor)) is in random order, the result could be different at different times due to precision loss. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
ElemwiseGradCompute
With the same inputs, dy can be different at different times.
x: 3398586.250000
y: 0.000000
out: 3398586.250000
E0422 12:33:49.270020 35231 elementwise_op_function.h:524] 2, 32, 220, 220 32
dout: 138643.281250
dx: 138643.281250
dy: 246.410400
x: 3398586.250000
y: 0.000000
out: 3398586.250000
E0422 12:33:52.696183 35231 elementwise_op_function.h:524] 2, 32, 220, 220 32
dout: 138643.281250
dx: 138643.281250
dy: 264.202179
reproduce
https://github.com/panyx0718/Paddle/tree/qingqing01-parallel_do_and_exe_compare2
CUDA_VISIBLE_DEVICES=3 ctest -R test_parallel_executor_grad
The text was updated successfully, but these errors were encountered: