-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLBlast: byte offset / element count confusion #3307
Comments
|
I think, it was was wrong from the beginning (since 2e6cd4b). With 2D tensors, offset is always 0, and the bug does not manifest itself. With 3D tensors, 2D slices partially overwrite previous slices. |
I fixed this specific bug by passing byte offset from However, this is not usable on its own, because that data is addressed incorrectly during computation. I don't know if this fix is worth merging separately. Currently I have only a partial fix for the computation part. |
You should make a draft PR. |
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
Correct uploading of contiguous 3D tensor data to GPU.
Current Behavior
ggml_cl_h2d_tensor_2d
usesoffset
argument as byte offset in a call toclEnqueueWriteBuffer
.ggml_cl_transform_tensor
passes element count asoffset
toggml_cl_h2d_tensor_2d
. This corresponds to byte offset only if element size is exactly 1.Also, I don't understand why
ggml_cl_mul_f32
passes non-zero offset toggml_cl_h2d_tensor_2d
.Environment and Context
AMD GPU
Linux
Steps to Reproduce
GGML_TYPE_F16
orGGML_TYPE_F32
data toggml_cl_transform_tensor
.ggml_cl_mul_mat
on that tensor.Ping
@0cc4m
@JohannesGaessler
@SlyEcho
The text was updated successfully, but these errors were encountered: