-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Wrong gradients on Windows-GPU #20471
Comments
I see the same sign flip with this other symbol (which can be fed to the same above script) And with this one
Output is
|
Which version of MXNet did you @matteosal use? |
With your sym3 example, here is what I got with MXNet 1.9 on Linux. Not sure if this issue only occurs on Windows. Did you @matteosal try it on Linux?
|
I am using version 2.0, built from source at commit fabcd14 |
@matteosal Thanks for the update. @leezu Do you have windows platform to help triage the the problem? |
I'm not a Windows user, so it's very hard for me to get MXNet running on Windows. @yajiedesign is Windows expert, maybe he can help |
I've tested with a 2.0 version modified by myself on Windows, and It's OK. Input + Target gradient, CPU (OK):
{'.Inputs.Input':
[[-0.33333334 -0.33333334 -0.33333334]]
<NDArray 1x3 @cpu(0)>, '.Inputs.Target':
[[0.33333334 0.33333334 0.33333334]]
<NDArray 1x3 @cpu(0)>, 'seq_715248120': None}
Input + Target gradient, GPU (OK):
{'.Inputs.Input':
[[-0.33333334 -0.33333334 -0.33333334]]
<NDArray 1x3 @gpu(0)>, '.Inputs.Target':
[[0.33333334 0.33333334 0.33333334]]
<NDArray 1x3 @gpu(0)>, 'seq_715248120': None}
Target gradient only, CPU (OK):
{'.Inputs.Input': None, '.Inputs.Target':
[[0.33333334 0.33333334 0.33333334]]
<NDArray 1x3 @cpu(0)>, 'seq_715248120': None}
Target gradient only, GPU (WRONG):
{'.Inputs.Input': None, '.Inputs.Target':
[[0.33333334 0.33333334 0.33333334]]
<NDArray 1x3 @gpu(0)>, 'seq_715248120': None} |
@chinakook What did you modify? Is it related to this gradient issue? Could you share it with @matteosal? |
A ping on this |
A ping on this. Can anyone please investigate? |
@matteosal What build settings should we use to reproduce this issue? |
@barry-jin here they are:
MKL version is 2019.4 and CUDA version is 11.4.0 |
@barry-jin any news on this? I have rebuilt with VC2019 in order to fix this issue but I still see this problem here |
Sorry, I'm still triaging this issue. I built with settings in build_window.py and can also reproduce this issue. |
@matteosal Current workaround is to replace 'elemwise_sub' with '_npi_subtract'. There are probably some issues in legacy subtract operator. |
@barry-jin thank you, I have verified that swapping the operator fixes the problem |
sym.zip
I only see this on Windows. Download the symbol file and run this script:
Output is:
The
Target
gradient has the sign flipped in the last example.The text was updated successfully, but these errors were encountered: