Skip to content

Conversation

@shani-f
Copy link
Contributor

@shani-f shani-f commented Oct 30, 2025

Summary

This PR replaces the previous SYCL implementation of REPEAT_BACK that I implemented in PR #16734.
The functionality remains identical, but the kernel is now significantly more efficient.


Changes

  • Rewrote ggml-sycl/repeat_back.cpp with a vectorized and cache-friendly kernel
  • Replaced 4 nested loops with a single fused loop
  • Removed expensive division and modulo operations from the hot path
  • Uses byte-stride (nb0..nb3) addressing for correct tensor layout access
  • Precomputes inverse sizes to avoid repeated integer math

Performance

  • ~3× fewer hot-path assembly instructions (verified in Compiler Explorer)
  • ~2× faster execution time on Intel UHD GPU
  • No additional memory allocation

Testing

  • All test-backend-ops -o REPEAT_BACK tests pass on SYCL
  • Matches CPU backend results bit-accurately
  • Verified across different tensor sizes, dimensions, and repeat configurations

Compatibility

  • Supports F32 tensors
  • Works on Level-Zero and OpenCL backends
  • Follows the existing SYCL backend structure and coding style

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Oct 30, 2025
@shani-f
Copy link
Contributor Author

shani-f commented Nov 2, 2025

Hello @CISC @NeoZhangJianyu,
I applied the required fixes and pushed the updated version.
Please let me know if anything else is needed.
Thanks!

Copy link
Collaborator

@NeoZhangJianyu NeoZhangJianyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good job!

Thank you!

@NeoZhangJianyu NeoZhangJianyu merged commit 7e99416 into ggml-org:master Nov 3, 2025
123 of 130 checks passed
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request Nov 3, 2025
* origin/master: (169 commits)
opencl: support imrope (ggml-org#16914)
fix: Viewing multiple PDF attachments (ggml-org#16974)
model-conversion : pass config to from_pretrained (ggml-org#16963)
server : add props.model_alias (ggml-org#16943)
ggml: CUDA: add head size 72 for flash-attn (ggml-org#16962)
mtmd: add --image-min/max-tokens (ggml-org#16921)
mtmd: pad mask for qwen2.5vl (ggml-org#16954)
ggml : LoongArch fixes (ggml-org#16958)
sync: minja (glm 4.6 & minmax m2 templates) (ggml-org#16949)
SYCL: optimized repeat_back kernel (3× fewer asm instructions, 2× faster)Feature/sycl repeat back opt (ggml-org#16869)
feat(webui): improve LaTeX rendering with currency detection (ggml-org#16508)
test-backend-ops : fix segfault in moe-expert-reduce test in support mode and coverage (ggml-org#16936)
ci : disable failing riscv cross build (ggml-org#16952)
model: add Janus Pro for image understanding (ggml-org#16906)
clip : use FA (ggml-org#16837)
server : support unified cache across slots (ggml-org#16736)
common : move gpt-oss reasoning processing to init params (ggml-org#16937)
docs: remove llama_sampler_accept reference in sampling sample usage (ggml-org#16920)
CUDA: add FLOOR, CEIL, ROUND, TRUNC unary ops (ggml-org#16917)
devops: fix failing s390x docker build (ggml-org#16918)
...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants