[sycl_ext_oneapi_clock] implement NVPTX case by tdavidcl · Pull Request #21280 · intel/llvm

tdavidcl · 2026-02-12T19:16:21Z

Hi after suggestion from @zjin-lcf here is a PR (context: KhronosGroup/SYCL-Docs#958).
It implements the NVPTX variant of clock() using the %%clock64 special register from PTX.

https://docs.nvidia.com/cuda/archive/10.1/parallel-thread-execution/index.html?utm_source=chatgpt.com#special-registers-clock64

PTX ISA Notes
Introduced in PTX ISA version 2.0.

So it is safe to assume that the register is supported regardless of the PTX version used since intel llvm assume >5.0 if I recall correctly.

reference for usage internally to llvm (on this repo actually, nice :) )

llvm/clang/lib/Headers/__clang_cuda_device_functions.h

Line 1558 in 2705b67

__DEVICE__ long long clock64() { return __nvvm_read_ptx_sreg_clock64(); }

(there is a typo in the PR which is already corrected by a commit, but i don't why it is not updating in the PR ...)

tdavidcl · 2026-02-12T19:42:26Z

Also I just found out that there is this file in LLVM libc/src/__support/GPU/utils.h which does define
uint64_t processor_clock() { return __builtin_readcyclecounter(); }
which is used in all test apparently.

We could maybe use that for both Nvidia and AMD since that's what is called within the CI.

zjin-lcf · 2026-02-12T22:46:45Z

Thank you. I found some post ROCm/ROCm#1288 that may be related to your comments.

sycl/include/sycl/ext/oneapi/experimental/clock.hpp

Co-authored-by: Alexey Bader <alexey.bader@intel.com>

tdavidcl · 2026-02-12T22:54:01Z

Thank you. I found some post ROCm/ROCm#1288 that may be related to your comments.

It seems that the native builtins are better whenever available. I can try to replace the amd & the else branch by __builtin_readcyclecounter then ?

zjin-lcf · 2026-02-12T23:04:32Z

@tdavidcl Please give a try for the amd and the else branch. Thanks.

tdavidcl · 2026-02-13T17:59:10Z

I've added it now it needs a bit of testing. I do not have access to a AMD GPU right now though. The best way of action would be probably a simple test to check that it compiles in all configurations + check that the return is both non zero and monotonically increase in subsequent calls. Where is the best spot to add such a test ?

KornevNikita · 2026-02-16T17:06:32Z

I've added it now it needs a bit of testing. I do not have access to a AMD GPU right now though. The best way of action would be probably a simple test to check that it compiles in all configurations + check that the return is both non zero and monotonically increase in subsequent calls. Where is the best spot to add such a test ?

https://github.com/intel/llvm/tree/sycl/sycl/test-e2e/Clock

KornevNikita · 2026-02-16T17:21:06Z

@tdavidcl thanks for working on this! Also, these functions require device to support aspects:

llvm/sycl/include/sycl/ext/oneapi/experimental/clock.hpp

Lines 47 to 49 in 1a77753

    
           #ifdef __SYCL_DEVICE_ONLY__ 
        
           [[__sycl_detail__::__uses_aspects__(sycl::aspect::ext_oneapi_clock_device)]] 
        
           #endif

That means we also need something like this but for CUDA adapter.

KornevNikita · 2026-02-16T17:56:43Z

sycl/include/sycl/ext/oneapi/experimental/clock.hpp

+// this is due to potential higher overhead compared to a native API call
+// see : https://github.com/ROCm/ROCm/issues/1288
+#if defined(__NVPTX__)
+  if constexpr (Scope == work_group || Scope == sub_group) {


Suggested change

if constexpr (Scope == work_group || Scope == sub_group) {

if constexpr (Scope == clock_scope::work_group || Scope == clock_scope::sub_group) {

Note - do not apply this as is, clang-format will fail because strings should be <= 80 symbols.

probably like:

if constexpr (Scope == clock_scope::work_group || Scope == clock_scope::sub_group) {

tdavidcl · 2026-02-16T17:58:54Z

I've added it now it needs a bit of testing. I do not have access to a AMD GPU right now though. The best way of action would be probably a simple test to check that it compiles in all configurations + check that the return is both non zero and monotonically increase in subsequent calls. Where is the best spot to add such a test ?

https://github.com/intel/llvm/tree/sycl/sycl/test-e2e/Clock

Oh perfect it looks like no changes are required in the tests beside enabling the device aspect. Additionally, in the clock test there is this snippet

// UNSUPPORTED: target-native_cpu
// UNSUPPORTED-TRACKER: https://github.com/intel/llvm/issues/20142

I have to check but i think that __builtin_readcyclecounter does support the host and maybe clock() could also be enabled for target-native_cpu.

[sycl_ext_oneapi_clock] implement NVPTX case

f6ed79d

tdavidcl requested a review from a team as a code owner February 12, 2026 19:16

tdavidcl requested a review from cperkinsintel February 12, 2026 19:16

typo

064f8ef

bader reviewed Feb 12, 2026

View reviewed changes

sycl/include/sycl/ext/oneapi/experimental/clock.hpp Outdated Show resolved Hide resolved

Update sycl/include/sycl/ext/oneapi/experimental/clock.hpp

ba65f4a

Co-authored-by: Alexey Bader <alexey.bader@intel.com>

use __builtin_readcyclecounter as fallback

b43c2b8

tdavidcl temporarily deployed to WindowsCILock February 16, 2026 17:12 — with GitHub Actions Inactive

tdavidcl temporarily deployed to WindowsCILock February 16, 2026 17:39 — with GitHub Actions Inactive

KornevNikita reviewed Feb 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[sycl_ext_oneapi_clock] implement NVPTX case#21280

[sycl_ext_oneapi_clock] implement NVPTX case#21280
tdavidcl wants to merge 4 commits intointel:syclfrom
tdavidcl:ptx-clock

tdavidcl commented Feb 12, 2026 •

edited

Loading

Uh oh!

tdavidcl commented Feb 12, 2026

Uh oh!

zjin-lcf commented Feb 12, 2026

Uh oh!

Uh oh!

tdavidcl commented Feb 12, 2026

Uh oh!

zjin-lcf commented Feb 12, 2026

Uh oh!

tdavidcl commented Feb 13, 2026

Uh oh!

KornevNikita commented Feb 16, 2026

Uh oh!

KornevNikita commented Feb 16, 2026

Uh oh!

KornevNikita Feb 16, 2026 •

edited

Loading

Uh oh!

KornevNikita Feb 16, 2026

Uh oh!

tdavidcl commented Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	if constexpr (Scope == work_group \|\| Scope == sub_group) {
	if constexpr (Scope == clock_scope::work_group \|\| Scope == clock_scope::sub_group) {

Conversation

tdavidcl commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tdavidcl commented Feb 12, 2026

Uh oh!

zjin-lcf commented Feb 12, 2026

Uh oh!

Uh oh!

tdavidcl commented Feb 12, 2026

Uh oh!

zjin-lcf commented Feb 12, 2026

Uh oh!

tdavidcl commented Feb 13, 2026

Uh oh!

KornevNikita commented Feb 16, 2026

Uh oh!

KornevNikita commented Feb 16, 2026

Uh oh!

KornevNikita Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

KornevNikita Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

tdavidcl commented Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tdavidcl commented Feb 12, 2026 •

edited

Loading

KornevNikita Feb 16, 2026 •

edited

Loading