Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLBlast: Fix handling of on-device tensor data #3447

Merged
merged 1 commit into from
Oct 5, 2023

Conversation

shibe2
Copy link
Contributor

@shibe2 shibe2 commented Oct 2, 2023

Fixes #3307 and related issues.

Uploading was tested by reading data back from VRAM (for which there is no API). It now works properly in all cases.

Matrix multiplication works as well when src0 is already in VRAM. It allowed me to test broadcasting (#3402) in more configurations.

Special code for matrix-vector multiplication remains broken. I intend to disable it in a separate request. But I can add it here too.

ggml_cl_mul (non-matrix multiplication) appears to have a problem with offsets too, but it is broken for other reasons as well, and I didn't touch it here.

I used nullptr instead of NULL in some cases. Is it okay?

Fix uploading tensor data to device, including 3D, 4D, and non-contiguous tensors.
Use correct offsets into data that is already in VRAM.
Correct handling of OpenCL events when multiple commands are queued.
@shibe2 shibe2 changed the title CLBlast: Fix uploading tensor data to device CLBlast: Fix handling of on-device tensor data Oct 5, 2023
@shibe2 shibe2 marked this pull request as ready for review October 5, 2023 12:18
@shibe2 shibe2 merged commit e2583cb into ggerganov:master Oct 5, 2023
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CLBlast: byte offset / element count confusion
2 participants