Skip to content

Conversation

MengAiDev
Copy link

@MengAiDev MengAiDev commented Aug 26, 2025

  • Add stream->wait() to ensure all kernels finish execution before proceeding
  • This resolves potential race conditions in the argsort operation

Fixes: #15580

- Add `stream->wait()` to ensure all kernels finish execution before proceeding
- This resolves potential race conditions in the argsort operation
@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Aug 26, 2025
@simonlui
Copy link

@MengAiDev The closing brace for the function is missing so it fails to compile when I tried to check out the branch. I added an extra line to close it with } and it works.

@NeoZhangJianyu
Copy link
Collaborator

#15580 support on iGPU.
Could you check if dGPU has this issue?
if no, maybe add the condition to check the iGPU and add wait() for iGPU only.

It could reduce the protentional risk to dGPU.

@simonlui
Copy link

@NeoZhangJianyu I have an Intel Arc A770 16GB and can confirm the issue existed on my dGPU too. This is a snippet from the backtrace I posted in the issue.
/home/simonlui/Code_Repositories/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:3380: GGML_ASSERT(row_id_i >= 0 && row_id_i < n_as) failed
Same assert error as iGPU.

@NeoZhangJianyu
Copy link
Collaborator

@NeoZhangJianyu I have an Intel Arc A770 16GB and can confirm the issue existed on my dGPU too. This is a snippet from the backtrace I posted in the issue. /home/simonlui/Code_Repositories/llama-cpp-python/vendor/llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:3380: GGML_ASSERT(row_id_i >= 0 && row_id_i < n_as) failed Same assert error as iGPU.

OK! Thank you for your feedback!
It's OK to me!

@MengAiDev
Copy link
Author

I have fix the }

@NeoZhangJianyu
Copy link
Collaborator

@MengAiDev
Sorry for delayed reply!
I have thought other maintainer will merge this PR.
But some maintainers won't focus on SYCL backend now.

I will continue to support SYCL backend.

I test this PR, it's passed.
But as my experiment, adding wait() will reduce a little performance.
And it will break the SYCL graph feature. (This feature is pending for other issue).

I have created a PR to fix argsort OP too: #16521.
Could it fix the issue #15580?

If the merged PR (#16521) could resolved the issue, I suggest not merging this PR.

How do you think?

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Eval bug: Asynchronous Kernel Execution on iGPU Causes Runtime Errors with MOE Model

3 participants