Skip to content

Conversation

tahonermann
Copy link
Contributor

@tahonermann tahonermann commented Aug 6, 2025

The sycl_kernel_entry_point attribute facilitates the generation of an offload kernel entry point function with parameters corresponding to the (potentially decomposed) kernel arguments and a body that (potentially reconstructs the arguments and) executes the kernel. This change adds symmetric support for the SYCL host through an interface that provides symbol names and (potentially decomposed) kernel arguments to the SYCL library.

Consider the following function declared with the sycl_kernel_entry_point attribute with a call to this function occurring in the implementation of a SYCL kernel invocation function such as sycl::handler::single_task().

  template<typename KernelNameType, typename KernelType>
  [[clang::sycl_kernel_entry_point(KernelNameType)]]
  void kernel_entry_point(KernelType kerne) {
    kernel();
  }

The body of the above function specifies the parameters and body of the generated offload kernel entry point. Clearly, a call to the above function by a SYCL kernel invocation function is not intended to execute the body as written. Previously, code generation emitted an empty function body so that calls to the function had no effect other than to trigger the generation of the offload kernel entry point. The function body is therefore available to hook for SYCL library support and is now substituted with a call to a (SYCL library provided) function template named sycl_enqueue_kernel_launch() with the kernel name type passed as the first template argument, the symbol name of the offload kernel entry point passed as a string literal for the first function argument, and the (possibly decomposed) parameters passed as the remaining explicit function arguments. Given a call like this:

  kernel_entry_point<struct KN>([]{})

the body of the instantiated kernel_entry_point() specialization would be substituted as follows with "kernel-symbol-name" substituted for the generated symbol name and kernel forwarded (This assumes no kernel argument decomposition; if decomposition was required, kernel would be replaced with its corresponding decomposed arguments).

  sycl_enqueue_kernel_launch<KN>("kernel-symbol-name", kernel)

Name lookup and overload resolution for the sycl_enqueue_kernel_launch() function is performed at the point of definition of the sycl_kernel_entry_point attributed function (or the point of instantiation for an instantiated function template specialization). If overload resolution fails, the program is ill-formed.

Implementation of the sycl_enqueue_kernel_launch() function might require additional information provided by the SYCL library. This is facilitated by removing the previous prohibition against use of the sycl_kernel_entry_point attribute with a non-static member function. If the sycl_kernel_entry_point attributed function is a non-static member function, then overload resolution for the sycl_enqueue_kernel_launch() function template may select a non-static member function in which case, this will be implicitly passed as the implicit object argument.

If a sycl_kernel_entry_point attributed function is a non-static member function, use of this in a potentially evaluated expression is prohibited in the definition (since this is not a kernel argument and will not be available within the generated offload kernel entry point function).

Support for kernel argument decomposition and reconstruction is not yet implemented.

@tahonermann tahonermann added the SYCL https://registry.khronos.org/SYCL label Aug 6, 2025
@tahonermann tahonermann requested a review from bader August 6, 2025 22:57
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fznamznon, I added this just for temporary testing purposes so that I could validate that code generation was working as expected. Some of this might be useful to you; the creation of the implicit array-to-pointer-decay cast for example. Use whatever you find useful and remove the rest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fznamznon, this is the major work that is left to do. Note that name lookup and overload resolution should be performed from the point of definition of the sycl_kernel_entry_point attributed function or, if that function is an instantiated function template specialization, the point of instantiation. I'm hopeful that SemaRef.CurContext reflects this point; see the associated assert in SemaSYCL::BuildSYCLKernelCallStmt() below.

We should consider whether to support implicit object parameter declarations or, at a minimum, add a test to ensure a proper diagnostic is issued for an example like this:

struct sycl_kernel_launcher {
  template<typename KN, typename... Ts>
  void sycl_enqueue_kernel_launch(this S* self, const char *name, Ts...);
  template<typename KN, typename KT>
  [[clang::sycl_kernel_entry_point(KN)]]
  void kernel_entry_point(KT kernel) { ... }
};

There is a design decision we can consider here too. Should we allow the sycl_enqueue_kernel_launch name to be a default name that can be customized by an additional argument to the sycl_kernel_entry_point attribute? Let me know if you have opinions on it. I'm otherwise content to leave such considerations to future work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going to go with sycl_enqueue_kernel_launch as a default name, it should either be reserved by SYCL spec or we need to add __ to it, otherwise it may/will clash with user's code. Same goes with class's name sycl_kernel_launcher. I don't have a strong preference on whether it should be possible to customize sycl_enqueue_kernel_launch name, I think people from Runtime library side may have a better understanding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A reserved name isn't needed since the SYCL RT will provide the sycl_enqueue_kernel_launch declaration. We aren't implicitly declaring any entities with this approach; we're just performing lookup for entities declared by the SYCL RT.

The sycl_kernel_launcher name isn't special in any way; I just used that name as an example. In this example, lookup for sycl_enqueue_kernel_launch would be performed from the definition of sycl_kernel_launcher::kernel_entry_point() and sycl_kernel_launcher::sycl_enqueue_kernel_launch() is in scope for name lookup at that point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We aren't implicitly declaring any entities with this approach; we're just performing lookup for entities declared by the SYCL RT.

I understand that. Still it may be unexpected for the users that SYCL RT declares that name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. We'll need to ensure the diagnostic issued when overload resolution fails is clear. I hope we can do better than what Clang does for CUDA when the __cudaPushCallConfiguration() declaration is missing:

t.cu:6:9: error: use of undeclared identifier __cudaPushCallConfiguration
   6 |   kernel<<<1,1>>>([] __attribute__((device)) {});
     |         ^

@tahonermann tahonermann force-pushed the sycl-upstream-fe-sycl_kernel_entry_point-host branch from 56407fb to 5b42f6b Compare August 7, 2025 02:27
// CHECK-HOST-LINUX-NEXT: ret void
// CHECK-HOST-LINUX-NEXT: }
//
// CHECK-HOST-LINUX: define internal void @"_Z18kernel_single_taskI6\CE\B4\CF\84\CF\87Z4mainEUliE_EvT0_"() #{{[0-9]+}} {
Copy link
Contributor

@Fznamznon Fznamznon Aug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tahonermann , shouldn't that function and others like it accept kernel object parameter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per offline discussion, yes it should be kind of maybe. I think what is happening here is that the kernel object is passed by value and, since the kernel object type is an empty class, the target ABI specifies that no arguments actually be passed; the (stateless) parameter object is constructed in the body of the function.

We could add state to the kernel object types to force an argument to be provided. I don't think it matters much though; all we really need to validate here is that the synthesized call to the sycl_enqueue_kernel_launch function is present and looks correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing that we should validate is exception handling. I don't know if any actual code changes will be required. Scenarios to be considered:

  • The sycl_kernel_entry_point function is declared noexcept; an exception propagating from the call to the sycl_enqueue_kernel_launch() function (including from construction of arguments) should result in a call to std::terminate().
  • The sycl_kernel_entry_point function is not declared noexcept; if the called sycl_enqueue_kernel_launch() function is also not declared noexcept, then exception handling scaffolding might be required (e.g., to destruct value parameters that have non-trivial destruction) and a thrown exception should propagate.
  • The sycl_kernel_entry_point function is not declared noexcept; if the called sycl_enqueue_kernel_launch() function is declared noexcept, then exception handling scaffolding can be omitted.

…t attribute.

The `sycl_kernel_entry_point` attribute facilitates the generation of an
offload kernel entry point function with parameters corresponding to the
(potentially decomposed) kernel arguments and a body that (potentially
reconstructs the arguments and) executes the kernel. This change adds
symmetric support for the SYCL host through an interface that provides
symbol names and (potentially decomposed) kernel arguments to the SYCL
library.

Consider the following function declared with the `sycl_kernel_entry_point`
attribute with a call to this function occurring in the implementation of
a SYCL kernel invocation function such as `sycl::handler::single_task()`.
  template<typename KernelNameType, typename KernelType>
  [[clang::sycl_kernel_entry_point(KernelNameType)]]
  void kernel_entry_point(KernelType kerne) {
    kernel();
  }

The body of the above function specifies the parameters and body of the
generated offload kernel entry point. Clearly, a call to the above function
by a SYCL kernel invocation function is not intended to execute the body
as written. Previously, code generation emitted an empty function body so
that calls to the function had no effect other than to trigger the generation
of the offload kernel entry point. The function body is therefore available
to hook for SYCL library support and is now substituted with a call to a
(SYCL library provided) function template named `sycl_enqueue_kernel_launch()`
with the kernel name type passed as the first template argument, the
symbol name of the offload kernel entry point passed as a string literal for
the first function argument, and the (possibly decomposed) parameters passed
as the remaining explicit function arguments. Given a call like this:
  kernel_entry_point<struct KN>([]{})
the body of the instantiated `kernel_entry_point()` specialization would be
substituted as follows with "kernel-symbol-name" substituted for the
generated symbol name and `kernel` forwarded (This assumes no kernel
argument decomposition; if decomposition was required, `kernel` would be
replaced with its corresponding decomposed arguments).
  sycl_enqueue_kernel_launch<KN>("kernel-symbol-name", kernel)

Name lookup and overload resolution for the `sycl_enqueue_kernel_launch()`
function is performed at the point of definition of the
`sycl_kernel_entry_point` attributed function (or the point of instantiation
for an instantiated function template specialization). If overload
resolution fails, the program is ill-formed.

Implementation of the `sycl_enqueue_kernel_launch()` function might require
additional information provided by the SYCL library. This is facilitated by
removing the previous prohibition against use of the `sycl_kernel_entry_point`
attribute with a non-static member function. If the `sycl_kernel_entry_point`
attributed function is a non-static member function, then overload resolution
for the `sycl_enqueue_kernel_launch()` function template may select a
non-static member function in which case, `this` will be implicitly passed
as the implicit object argument.

If a `sycl_kernel_entry_point` attributed function is a non-static member
function, use of `this` in a potentially evaluated expression is prohibited
in the definition (since `this` is not a kernel argument and will not be
available within the generated offload kernel entry point function).

Support for kernel argument decomposition and reconstruction is not yet
implemented.
@tahonermann tahonermann force-pushed the sycl-upstream-fe-sycl_kernel_entry_point-host branch from 7a913b2 to 70f34c3 Compare August 25, 2025 18:22
…#51)

* Add support for host kernel launch stmt generation

This adds generation of a call to sycl_enqueue_kernel_launch function
aka "launcher" function. The launcher function can be a memeber of a
class or a free function defined at namespace scope. The lookup is
performed from SKEP attributed function scope. Because unqualified
lookup requires Scope object present and it only exists during parsing
stage and already EOLed at the point where templates instantiated, I had
to move some parts of SYCLKernelCallStmt generation to earlier stages
and now TreeTransform knows how to process SYCLKernelCallStmt.
I also had to invent a new expression - UnresolvedSYCLKernelExpr which
represents a string containing kernel name of a kernel that doesn't
exist yet. This expression is supposed to be transformed to a
StringLiteral during template instantiation phase. It should never reach
AST consumers like CodeGen of constexpr evaluators. This still requires
more testing and FIXME cleanups, but since it evolved into a quite
complicated patch I'm pushing it for earlier feedback.

* Remove a fixme from SemaSYCL

* Do not crash if original body was invalid

* Add AST test for skep-attributed member

* Fix a warning

* Extend codegen test a bit

* Find and replace

UnresolvedSYCLKernelNameExpr -> UnresolvedSYCLKernelLaunchExpr

* Implement the thing

* One more find and replace

* I don't know how it looks like

* Find and replace again

* Switch to UnresolvedSYCLKernelEntryPointStmt

* Apply suggestions from code review

* Remove log.txt

* Implement visiting

* Add tests

* Apply suggestions from code review

Co-authored-by: Tom Honermann <tom@honermann.net>

* IdExpr -> KernelLaunchIdExpr

* Don't rely on compound

* UnresolvedSYCLKernelEntryPointStmt -> UnresolvedSYCLKernelCall

* Fix warnings

* Rename sycl_enqueue_kernel_launch -> sycl_kernel_launch

* Apply suggestions from code review

Co-authored-by: Tom Honermann <tom@honermann.net>

* Remove array decay

* Add windows run line to the sema test

---------

Co-authored-by: Tom Honermann <tom@honermann.net>
Copy link

github-actions bot commented Sep 29, 2025

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff origin/main HEAD --extensions cpp,h -- clang/test/AST/ast-print-sycl-kernel-call.cpp clang/test/SemaSYCL/sycl-host-kernel-launch.cpp clang/test/SemaSYCL/sycl-kernel-entry-point-attr-this.cpp clang/include/clang/AST/ASTNodeTraverser.h clang/include/clang/AST/RecursiveASTVisitor.h clang/include/clang/AST/StmtSYCL.h clang/include/clang/Sema/ScopeInfo.h clang/include/clang/Sema/SemaSYCL.h clang/include/clang/Serialization/ASTBitCodes.h clang/lib/AST/ComputeDependence.cpp clang/lib/AST/StmtPrinter.cpp clang/lib/AST/StmtProfile.cpp clang/lib/CodeGen/CGStmt.cpp clang/lib/CodeGen/CodeGenFunction.h clang/lib/CodeGen/CodeGenSYCL.cpp clang/lib/Sema/SemaDecl.cpp clang/lib/Sema/SemaExceptionSpec.cpp clang/lib/Sema/SemaSYCL.cpp clang/lib/Sema/TreeTransform.h clang/lib/Serialization/ASTReaderStmt.cpp clang/lib/Serialization/ASTWriterStmt.cpp clang/lib/StaticAnalyzer/Core/ExprEngine.cpp clang/test/ASTSYCL/ast-dump-sycl-kernel-call-stmt.cpp clang/test/ASTSYCL/ast-dump-sycl-kernel-entry-point.cpp clang/test/CodeGenSYCL/kernel-caller-entry-point.cpp clang/test/CodeGenSYCL/unique_stable_name_windows_diff.cpp clang/test/SemaSYCL/sycl-kernel-entry-point-attr-appertainment.cpp clang/test/SemaSYCL/sycl-kernel-entry-point-attr-grammar.cpp clang/test/SemaSYCL/sycl-kernel-entry-point-attr-kernel-name-module.cpp clang/test/SemaSYCL/sycl-kernel-entry-point-attr-kernel-name-pch.cpp clang/test/SemaSYCL/sycl-kernel-entry-point-attr-kernel-name.cpp clang/test/SemaSYCL/sycl-kernel-entry-point-attr-sfinae.cpp clang/tools/libclang/CXCursor.cpp

⚠️
The reproduction instructions above might return results for more than one PR
in a stack if you are using a stacked PR workflow. You can limit the results by
changing origin/main to the base branch/commit you want to compare against.
⚠️

View the diff from clang-format here.
diff --git a/clang/include/clang/AST/StmtSYCL.h b/clang/include/clang/AST/StmtSYCL.h
index c8da49873..8a8fd7d16 100644
--- a/clang/include/clang/AST/StmtSYCL.h
+++ b/clang/include/clang/AST/StmtSYCL.h
@@ -118,8 +118,8 @@ class UnresolvedSYCLKernelCallStmt : public Stmt {
   void setOriginalStmt(CompoundStmt *CS) { OriginalStmt = CS; }
 
 public:
-  static UnresolvedSYCLKernelCallStmt *
-  Create(const ASTContext &C, CompoundStmt *CS, Expr *IdExpr) {
+  static UnresolvedSYCLKernelCallStmt *Create(const ASTContext &C,
+                                              CompoundStmt *CS, Expr *IdExpr) {
     return new (C) UnresolvedSYCLKernelCallStmt(CS, IdExpr);
   }
 
diff --git a/clang/include/clang/Sema/SemaSYCL.h b/clang/include/clang/Sema/SemaSYCL.h
index 76046b765..7177d5163 100644
--- a/clang/include/clang/Sema/SemaSYCL.h
+++ b/clang/include/clang/Sema/SemaSYCL.h
@@ -69,8 +69,7 @@ public:
   StmtResult BuildSYCLKernelCallStmt(FunctionDecl *FD, CompoundStmt *Body,
                                      Expr *LaunchIdExpr);
   ExprResult BuildSYCLKernelLaunchIdExpr(FunctionDecl *FD, QualType KNT);
-  StmtResult BuildUnresolvedSYCLKernelCallStmt(CompoundStmt *CS,
-                                                     Expr *IdExpr);
+  StmtResult BuildUnresolvedSYCLKernelCallStmt(CompoundStmt *CS, Expr *IdExpr);
 };
 
 } // namespace clang
diff --git a/clang/lib/Sema/SemaDecl.cpp b/clang/lib/Sema/SemaDecl.cpp
index bde026e46..a4b283afd 100644
--- a/clang/lib/Sema/SemaDecl.cpp
+++ b/clang/lib/Sema/SemaDecl.cpp
@@ -16205,8 +16205,8 @@ Decl *Sema::ActOnStartOfFunctionDef(Scope *FnBodyScope, Decl *D,
 
   maybeAddDeclWithEffects(FD);
 
-  if (FD && !FD->isInvalidDecl() &&
-      FD->hasAttr<SYCLKernelEntryPointAttr>() && FnBodyScope) {
+  if (FD && !FD->isInvalidDecl() && FD->hasAttr<SYCLKernelEntryPointAttr>() &&
+      FnBodyScope) {
     // Building KernelLaunchIdExpr requires performing an unqualified lookup
     // which can only be done correctly while the stack of parsing scopes is
     // alive, so we do it here when we start parsing function body even if it is
diff --git a/clang/lib/Sema/SemaSYCL.cpp b/clang/lib/Sema/SemaSYCL.cpp
index 9e558a983..46d705c6e 100644
--- a/clang/lib/Sema/SemaSYCL.cpp
+++ b/clang/lib/Sema/SemaSYCL.cpp
@@ -456,17 +456,16 @@ ExprResult SemaSYCL::BuildSYCLKernelLaunchIdExpr(FunctionDecl *FD,
 }
 
 StmtResult SemaSYCL::BuildUnresolvedSYCLKernelCallStmt(CompoundStmt *CS,
-                                                             Expr *IdExpr) {
+                                                       Expr *IdExpr) {
   return UnresolvedSYCLKernelCallStmt::Create(SemaRef.getASTContext(), CS,
-                                                    IdExpr);
+                                              IdExpr);
 }
 
 namespace {
 
 void PrepareKernelArgumentsForKernelLaunch(SmallVectorImpl<Expr *> &Args,
-                                                  const SYCLKernelInfo *SKI,
-                                                  Sema &SemaRef,
-                                                  SourceLocation Loc) {
+                                           const SYCLKernelInfo *SKI,
+                                           Sema &SemaRef, SourceLocation Loc) {
   assert(SKI && "Need a kernel!");
   ASTContext &Ctx = SemaRef.getASTContext();
 
@@ -497,9 +496,8 @@ void PrepareKernelArgumentsForKernelLaunch(SmallVectorImpl<Expr *> &Args,
   }
 }
 
-StmtResult BuildSYCLKernelLaunchStmt(Sema &SemaRef,
-                                            const SYCLKernelInfo *SKI,
-                                            Expr *IdExpr, SourceLocation Loc) {
+StmtResult BuildSYCLKernelLaunchStmt(Sema &SemaRef, const SYCLKernelInfo *SKI,
+                                     Expr *IdExpr, SourceLocation Loc) {
   SmallVector<Stmt *> Stmts;
   assert(SKI && "Need a Kernel!");
 
diff --git a/clang/lib/Sema/TreeTransform.h b/clang/lib/Sema/TreeTransform.h
index 355a8b0f6..037b61630 100644
--- a/clang/lib/Sema/TreeTransform.h
+++ b/clang/lib/Sema/TreeTransform.h
@@ -12884,11 +12884,11 @@ StmtResult TreeTransform<Derived>::TransformUnresolvedSYCLKernelCallStmt(
   ExprResult IdExpr = getDerived().TransformExpr(S->getKernelLaunchIdExpr());
 
   if (IdExpr.isInvalid())
-     return StmtError();
+    return StmtError();
 
   StmtResult Body = getDerived().TransformStmt(S->getOriginalStmt());
   if (Body.isInvalid())
-     return StmtError();
+    return StmtError();
 
   StmtResult SR = SemaRef.SYCL().BuildSYCLKernelCallStmt(
       cast<FunctionDecl>(SemaRef.CurContext), cast<CompoundStmt>(Body.get()),

Comment on lines +8 to +9
template<typename KernelName, typename KernelType>
void kernel_entry_point(KernelType kernel) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aelovikov-intel, noticed that [clang:sycl_kernel_entry_point(KernelName)] is missing here.

@@ -0,0 +1,22 @@
// RUN: %clang_cc1 -fsycl-is-host -ast-print %s -o - | FileCheck %s
// RUN: %clang_cc1 -fsycl-is-device -ast-print %s -o - | FileCheck %s

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test has no attributes and is in AST/ not in ASTSYCL/, why is that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say a combination of following existing precedent (all? of the other ast-print tests are there) and my failure to think about the fact that we recently introduced the ASTSYCL directory.

Comment on lines +37 to +38
template <typename KernelName, typename... Ts>
void sycl_kernel_launch(const char *, Ts...) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see

sycl_kernel_launch(...) {}

namespace ns {
sycl_kernel_launch(...) { /* another impl */ }

[[clang::sycl_kernel_entry_point(...)] ...;
}

somewhere too to verify what sycl_kernel_launch is found by name lookup.

In case a function with skep attribute is instantiated two times with
the same kernel name the attribute is invalid due to the conflicting name.
Make sure to exit from instantiation of UnresolvedSYCLKernelCallStmt in
this case.
…ivially-copyable (#53)

device-copyable doesn't mean trivially-copyable, so we may encounter
arguments that need cleanup. Adds test that verifies presence of the
dtor call in the synthesized code.
@YuriPlyakhin
Copy link
Contributor

nit: typo in PR description: void kernel_entry_point(KernelType kerne) { -> void kernel_entry_point(KernelType kernel) {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
SYCL https://registry.khronos.org/SYCL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants