-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Wrong gradients in the CUDA implementation of Layer Norm #2902
Comments
Yeah I noticed this a bit ago but didn't figure it out when I looked. Need to look again though and figure it out. Although maybe you will figure it out first :) Must just be some subtle typo somewhere. |
Yes, I will try to figure out. I will check what test the |
Yeah hard to say, bugs can be anywhere and you never know until after the fact 🤷 :D |
It's really weird. If I change this line: Line 605 in 46e59a2
to the size that test_layer tests
resizable_tensor x(4, 2, 2, 4); Then, this one fails: the values are entirely different. But with the previous size, the values are the same. Line 650 in 46e59a2
|
I will tackle this again at some point. Sometimes, when things like this happen and I have absolutely no idea why, I question myself a lot... |
Na everyone thinks that from time to time. Don't sweat it :) |
I could be naive here, is there a reason why Layernorm isn't using CUDNN ? |
Oh, I wasn't aware it was possible to do Layer Normalization with cuDNN. It seems weird that this layer, now used in most Transformer-Based networks, has no cuDNN implementation… |
Somewhere in the docs I read it could be used for multiple types of normalization... |
Ah, maybe you're referring to this? This is for FC or Conv mode in Batch Norm, which dlib already uses. |
What Operating System(s) are you seeing this problem on?
Linux (x86-64)
dlib version
19.24
Python version
N/A
Compiler
GCC 12.3.0
Expected Behavior
Expect the tests to pass for LayerNorm, both in CPU and CUDA implementations.
Current Behavior
This test passed (I have CUDA enabled of course):
dlib/dlib/test/dnn.cpp
Lines 603 to 653 in 46e59a2
In the first part (before the
#if DLIB_USE_CUDA
) we are checking that the Layer Normalization actually does what it says on CPU: normalizes each sample.In the second part, I compute the CUDA version and check it is equal to the CPU version.
That works, Then, I proceed to compute the gradients on both CPU and GPU and check if they are equal.
Both these tests pass. However, this one only passes on CPU and does not pass on GPU:
dlib/dlib/test/dnn.cpp
Lines 2007 to 2012 in 46e59a2
I have stared at both implementations for a while and I can't see what I am doing wrong.
If someone could have an extra look, I'll appreciate that.
Steps to Reproduce
Just run the test suite with CUDA enabled.
Anything else?
No response
The text was updated successfully, but these errors were encountered: