-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ggml : testing GPU FP precision via quantized CPY #4698
Conversation
It may be due to the floating point rounding mode. On the CPU it should be round-to-nearest by default. CUDA also supports round-to-nearest, but because we use the After removing With Metal it is not clear to me how to configure the rounding mode, but regardless it doesn't seem to support round-to-nearest: |
What specifically does the test assert? That the results after quantization are exactly equal? |
It uses a normalized MSE to compare the results between CPU and GPU, but the allowed error is only 1e-7. There isn't anything special about that value, it's just that most tests pass with that error. |
It seems with Metal the diff --git a/Makefile b/Makefile
index 28c6d79b..cc578371 100644
--- a/Makefile
+++ b/Makefile
@@ -111,8 +111,8 @@ MK_CFLAGS += -Ofast
HOST_CXXFLAGS += -Ofast
MK_NVCCFLAGS += -O3
else
-MK_CFLAGS += -O3
-MK_CXXFLAGS += -O3
+MK_CFLAGS += -O3 -ffast-math
+MK_CXXFLAGS += -O3 -ffast-math
endif
# clock_gettime came in POSIX.1b (1993) However, the rounding issue remains. Sometimes, one of the Metal quants will get rounded in the wrong direction (id = 244):
I'm also not able to find an option to control the rounding mode.
Isn't it "round-to-zero" on the CPU? Casting
|
If my understanding of the code is correct the tolerance is relative and applied to the mean squared error. This effectively tests whether the results are equal with a precision of ~11.6 bits. For reference, FP32 has a precision of 24 bits while FP16 has a precision of 11 bits. Considering how robust neural networks are to quantization I don't think this is cause for concern. Also keep in mind that relative errors have a poor condition number if the original value is small. Another project that I have worked on has had similar issues when approximating functions via interpolation. An alternative metric that could be used is the asymmetry This metric is more robust when |
@JohannesGaessler I will write more info about the test later - need to AFK for a few hours |
Conversion from To disable fast math with Metal, it should be possible to pass a |
You could also consider doing something similar to numpy.allclose where both an absolute and a relative tolerance are provided and false is only returned if the observed difference is larger than the sum of the absolute and the relative tolerance. |
ddba60f
to
f64e4f0
Compare
@JohannesGaessler For some of the more complicated ops we expect the results between the CPU and the GPU to be numerically different and in these cases the question for choosing the correct metric for measuring the error is relevant. However, for the I've finally found a way to disable Note that we don't plan to disable |
Wanted to find out why the Metal
test-backend-ops
is failing from time to time. This PR applies just theGGML_OP_CPY
operation withF32
src ->Q4_1
dst. I.e. it performs quantization.Running this long enough will eventually generate an error:
It looks like the floating-point operation for computing
d
can produce different results between the CPU and the GPU:Not sure how to fix this - ideas?
While looking into this issue, I found that CUDA can also fail the CPY test sometimes.
On
master
, run this and it will eventually fail, though I haven't investigated what is the source of the error in this case:Sometimes it can take a while. Reproed on RTX 2060 and V100