ElemwiseGradCompute is non-deterministic #10122

panyx0718 · 2018-04-22T12:37:03Z

ElemwiseGradCompute

With the same inputs, dy can be different at different times.

x: 3398586.250000
y: 0.000000
out: 3398586.250000
E0422 12:33:49.270020 35231 elementwise_op_function.h:524] 2, 32, 220, 220 32
dout: 138643.281250
dx: 138643.281250
dy: 246.410400

x: 3398586.250000
y: 0.000000
out: 3398586.250000
E0422 12:33:52.696183 35231 elementwise_op_function.h:524] 2, 32, 220, 220 32
dout: 138643.281250
dx: 138643.281250
dy: 264.202179

reproduce
https://github.com/panyx0718/Paddle/tree/qingqing01-parallel_do_and_exe_compare2
CUDA_VISIBLE_DEVICES=3 ctest -R test_parallel_executor_grad

panyx0718 · 2018-04-22T12:39:55Z

I noticed this when I run the same model twice with the same input and seed, but get different results.

I leads to different converge path for relatively deep models, such as se-resnext.

Not sure why it happens. A guess is: if the float aggregation (e.g. sum(tensor)) is in random order, the result could be different at different times due to precision loss.

panyx0718 assigned chengduoZH Apr 22, 2018

chengduoZH mentioned this issue Apr 24, 2018

Fix elementwise_gradient bug #10150

Merged

chengduoZH closed this as completed Apr 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ElemwiseGradCompute is non-deterministic #10122

ElemwiseGradCompute is non-deterministic #10122

panyx0718 commented Apr 22, 2018 •

edited

Loading

panyx0718 commented Apr 22, 2018 •

edited

Loading

ElemwiseGradCompute is non-deterministic #10122

ElemwiseGradCompute is non-deterministic #10122

Comments

panyx0718 commented Apr 22, 2018 • edited Loading

panyx0718 commented Apr 22, 2018 • edited Loading

panyx0718 commented Apr 22, 2018 •

edited

Loading

panyx0718 commented Apr 22, 2018 •

edited

Loading