[SYCL][Devicelib] Implement cmath rintf wrapper with __spirv_ocl_rint #18857

wenju-he · 2025-06-09T07:48:10Z

This PR is to support the use of std::rint in device code. Currently it
is resolved to rintf symbol. With this PR, the rintf symbol is resolved
by libdevice.

npmiller

LGTM, we could probably remove the NVPTX/AMDGCN specific path and use the __spirv_ocl_rint path for them as well, but we can look at that in a different patch.

bader · 2025-06-09T14:31:15Z

In general, there is no point in adding __device_lib_<func> fallback implementation to the devicelib for the functions which have corresponding __spirv_ocl_<func>. The whole idea behind __device_lib_ is to bypass adding SPIR-V extensions for C/C++ standard library functions. If SPIR-V instruction is already defined, we can implement corresponding C/C++ functions as wrappers in the headers.

againull

Could you add a test please.

wenju-he · 2025-06-10T03:42:02Z

If SPIR-V instruction is already defined, we can implement corresponding C/C++ functions as wrappers in the headers.

Could you point me to the wrapper that implements std::rint with __spirv_ocl_rint?

This PR is to support use of std::rint in device code. Currently it is resolved to rintf symbol. With this PR, rintf symbol is resolved by libdevice.

jinge90 · 2025-06-10T05:21:33Z

If SPIR-V instruction is already defined, we can implement corresponding C/C++ functions as wrappers in the headers.

Could you point me to the wrapper that implements std::rint with __spirv_ocl_rint?

This PR is to support use of std::rint in device code. Currently it is resolved to rintf symbol. With this PR, rintf symbol is resolved by libdevice.

Hi, @bader and @wenju-he
If it is for support C++ std::rint, the header file is in user's system header which we can't modify. But we have decided to remove fallback devicelib, we can directly use __spirv_oc_rint in cmath_wrapper.cpp 'rintf' function.

wenju-he · 2025-06-10T05:55:19Z

the header file is in user's system header which we can't modify. But we have decided to remove fallback devicelib, we can directly use __spirv_oc_rint in cmath_wrapper.cpp 'rintf' function.

done, thanks.

… target

npmiller · 2025-06-10T08:13:00Z

If SPIR-V instruction is already defined, we can implement corresponding C/C++ functions as wrappers in the headers.

Could you point me to the wrapper that implements std::rint with __spirv_ocl_rint?

This PR is to support use of std::rint in device code. Currently it is resolved to rintf symbol. With this PR, rintf symbol is resolved by libdevice.

It's not actually ready yet, but I've been working on that in #18706 which will address this rint issue once completed, but it needs more work and is currently only tested for CUDA/HIP. So I think it's fine to go ahead with this PR until the header solution is ready.

wenju-he · 2025-06-11T01:22:49Z

It's not actually ready yet, but I've been working on that in #18706 which will address this rint issue once completed, but it needs more work and is currently only tested for CUDA/HIP. So I think it's fine to go ahead with this PR until the header solution is ready.

thanks @npmiller, #18706 looks great.

### Overview Currently to support C++ builtins in SYCL kernels, we rely on `libdevice` which provides implementations for standard library builtins. This library is built either to bitcode or SPIR-V and linked in our kernels. On some targets this causes issues because clang sometimes turns standard library calls into LLVM intrinsics that not all targets support. Specifically on NVPTX and AMDGCN we can't easily support these intrinsics because we currently use implementations provided by CUDA and HIP in the form of a bitcode library, which is not something we can use from the LLVM backend. In upstream LLVM for CUDA and HIP kernels, the way this is handled is that they have clang headers providing device-side overloads of C++ library functions that hook into the target specific versions of the builtins (for example `std::sin` to `__nv_sin`). This way on the device side C++ builtins are hijacked before clang can turn them to intrinsics which solves the issue mentioned above. This patch is adding the infrastructure to support handling C++ builtins in SYCL in the same way as it is done for CUDA and HIP in upstream LLVM. And is using it to support `cmath` in NVPTX and AMDGCN compilation. ## Breakdown * Add `sycl_device_only` attribute: This new attribute allows functions marked with it to be treated as device-side overload of existing functions. This is what allows us to overload C++ library functions for device in SYCL. * Remove clang hack to prevent generating LLVM intrinsics from standard library builtins for NVPTX and AMDGCN. In theory since this is only moving `cmath`, the hack could still be needed, but it looks fine in testing and if we run into issues we should just move the problematic builtins to this solution. The test `sycl-libdevice-cmath.cpp` was testing this hack, so it was removed. * `cmath` support for NVPTX and AMDGCN in `libdevice` was disabled. To limit the scope of the patch `libdevice` is still fully wired up for these targets, but it just won't provide the `cmath` functions. * Added a `cmath-fallback.h` header providing the device-side math function overloads. They are defined using SPIR-V builtins, so in theory this header could be used as-is for other targets. * Use our existing `cmath` stl wrapper to include `cmath-fallback.h` for NVPTX and AMDGCN. In upstream LLVM `clang-cuda` always includes with `-include` the header with these overloads, using the stl wrappers is a bit more selective. * Add `rint` to device lib tests and stl wrapper, this was added in #18857 but wasn't in E2E testing. ## Compile-time performance A quick check of compile-time shows that this seems to provide a small performance improvement. Using two samples, one using cmath (the E2E `cmath_test.cpp`), and a sample not using cmath, over 10 iterations, I'm getting the following results: | Run | Mean | Stdev | |:--:|:--:|:--:| |With patch, cmath sample | 4.2229s | 0.0294s | |With patch, no cmath sample | 5.7484s | 0.0525s | |Without patch, cmath sample | 4.3817s | 0.0424s | |Without patch, no cmath sample | 5.7941s | 0.0452s | Which suggest that the no cmath compile time performance is pretty much equivalent, and the cmath compile-time performance is faster by roughly ~0.12s. And this is with the whole `libdevice` setup still in place, so it's possible this approach could be even more beneficial with more work. ## Future work * Investigate commented out standard math builtins in `cmath-fallback.h`, these weren't defined in libdevice, we should either remove the commented out lines or implement them properly. * Untangle `cmath` and `math.h`, the current `cmath-fallback.h` implements both which seems to work fine, but ideally we should split it up. * Deal with `nearbyint`, this was only implemented for NVPTX and AMDGCN in `libdevice`, this patch keeps it the same, but we should look into proper support and testing for this. * Move more of `libdevice` into headers (complex, assert, crt, etc ...). * Try this approach for SPIR-V or other targets. --------- Co-authored-by: Alexey Bader <alexey.bader@intel.com>

[SYCL][Devicelib] Add __devicelib_rint and fallback to __spirv_ocl_rint

f1a00d8

wenju-he requested a review from a team as a code owner June 9, 2025 07:48

wenju-he requested a review from againull June 9, 2025 07:48

wenju-he temporarily deployed to WindowsCILock June 9, 2025 07:48 — with GitHub Actions Inactive

wenju-he requested a review from jinge90 June 9, 2025 07:48

wenju-he temporarily deployed to WindowsCILock June 9, 2025 08:07 — with GitHub Actions Inactive

npmiller approved these changes Jun 9, 2025

View reviewed changes

jinge90 approved these changes Jun 9, 2025

View reviewed changes

againull reviewed Jun 9, 2025

View reviewed changes

add test, remove fallback

b7b3385

wenju-he temporarily deployed to WindowsCILock June 10, 2025 05:49 — with GitHub Actions Inactive

wenju-he changed the title ~~[SYCL][Devicelib] Add __devicelib_rint and fallback to __spirv_ocl_rint~~ [SYCL][Devicelib] Implement cmath rintf wrapper with __spirv_ocl_rint Jun 10, 2025

wenju-he requested review from againull and jinge90 June 10, 2025 05:55

wenju-he had a problem deploying to WindowsCILock June 10, 2025 06:11 — with GitHub Actions Error

remove sycl test, rint is lowered to llvm.rint.f32 intrinsic for spir…

f340020

… target

wenju-he temporarily deployed to WindowsCILock June 10, 2025 06:14 — with GitHub Actions Inactive

wenju-he temporarily deployed to WindowsCILock June 10, 2025 06:37 — with GitHub Actions Inactive

againull approved these changes Jun 10, 2025

View reviewed changes

againull merged commit fbf735a into intel:sycl Jun 10, 2025
24 checks passed

wenju-he deleted the __devicelib_rint branch June 11, 2025 01:21

npmiller mentioned this pull request Jun 18, 2025

[SYCL][NVPTX][AMDGCN] Move devicelib cmath to header #18706

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYCL][Devicelib] Implement cmath rintf wrapper with __spirv_ocl_rint #18857

[SYCL][Devicelib] Implement cmath rintf wrapper with __spirv_ocl_rint #18857

Uh oh!

wenju-he commented Jun 9, 2025 •

edited

Loading

Uh oh!

npmiller left a comment

Uh oh!

bader commented Jun 9, 2025

Uh oh!

againull left a comment

Uh oh!

wenju-he commented Jun 10, 2025

Uh oh!

jinge90 commented Jun 10, 2025

Uh oh!

wenju-he commented Jun 10, 2025

Uh oh!

npmiller commented Jun 10, 2025

Uh oh!

Uh oh!

wenju-he commented Jun 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[SYCL][Devicelib] Implement cmath rintf wrapper with __spirv_ocl_rint #18857

[SYCL][Devicelib] Implement cmath rintf wrapper with __spirv_ocl_rint #18857

Uh oh!

Conversation

wenju-he commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

npmiller left a comment

Choose a reason for hiding this comment

Uh oh!

bader commented Jun 9, 2025

Uh oh!

againull left a comment

Choose a reason for hiding this comment

Uh oh!

wenju-he commented Jun 10, 2025

Uh oh!

jinge90 commented Jun 10, 2025

Uh oh!

wenju-he commented Jun 10, 2025

Uh oh!

npmiller commented Jun 10, 2025

Uh oh!

Uh oh!

wenju-he commented Jun 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

wenju-he commented Jun 9, 2025 •

edited

Loading