Skip to content

Conversation

Fznamznon
Copy link

@Fznamznon Fznamznon commented Sep 2, 2025

This adds generation of a call to sycl_enqueue_kernel_launch function aka "launcher" function. The launcher function can be a memeber of a class or a free function defined at namespace scope. The lookup is performed from SKEP attributed function scope. Because unqualified lookup requires Scope object present and it only exists during parsing stage and already EOLed at the point where templates instantiated, I had to move some parts of SYCLKernelCallStmt generation to earlier stages.

This adds generation of a call to sycl_enqueue_kernel_launch function
aka "launcher" function. The launcher function can be a memeber of a
class or a free function defined at namespace scope. The lookup is
performed from SKEP attributed function scope. Because unqualified
lookup requires Scope object present and it only exists during parsing
stage and already EOLed at the point where templates instantiated, I had
to move some parts of SYCLKernelCallStmt generation to earlier stages
and now TreeTransform knows how to process SYCLKernelCallStmt.
I also had to invent a new expression - UnresolvedSYCLKernelExpr which
represents a string containing kernel name of a kernel that doesn't
exist yet. This expression is supposed to be transformed to a
StringLiteral during template instantiation phase. It should never reach
AST consumers like CodeGen of constexpr evaluators. This still requires
more testing and FIXME cleanups, but since it evolved into a quite
complicated patch I'm pushing it for earlier feedback.
@Fznamznon
Copy link
Author

Hey, @tahonermann , I'm unable to push to your fork for some reason, so here is a PR against your branch.

Copy link
Owner

@tahonermann tahonermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like how this is coming together. I got about half way through adding comments. I'll have some more tomorrow.

Comment on lines 16384 to 16387
auto *BodyCompound = dyn_cast_or_null<CompoundStmt>(Body);
// If Body is not a compound, that was a templated function and we don't
// need to build SYCLKernelCallStmt for it since it was already created by
// template instantiator.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little nervous about relying on a check for CompountStmt specifically here to determine what to build. I think there are three cases to consider:

  1. A function template or templated function for which an UnresolvedSYCLKernelEntryPointStmt is needed.
  2. An instantiated function template for which a SYCLKernelCallStmt was already instantiated.
  3. An explicit function template specialization or a non-template function for which a SYCLKernelCallStmt is needed.

I think these can be differentiated as follows:

  1. if FD->isTemplated() is true.
  2. otherwise, if FD->getTemplateSpecializationInfo() is non-null and isTemplateInstantiation(FD->getTemplateSpecializationKind())` is true.
  3. everything else.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. FD->getTemplateSpecializationInfo() && isTemplateInstantiation(FD->getTemplateSpecializationKind()) seems to be breaking some tests. I used FD->isTemplateInstantiation() to detect the case instead.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I added the FD->getTemplateSpecializationInfo() condition because I wasn't sure what FD->getTemplateSpecializationKind() would return for a non-template function. But if it works as expected, then great.

There is a comment in the FunctionDecl::isTemplateInstantiation() definition that I don't really know what to make of. My best guess is that it relates to a possible ambiguous classification of an instantiated dependent explicit specialization. It could also be that the FIXME comment is old/stale; I didn't dig into it. The isTemplateInstantiation() name certainly suggests it is what we want here.

bool FunctionDecl::isTemplateInstantiation() const {
  // FIXME: Remove this, it's not clear what it means. (Which template
  // specialization kind?)
  return clang::isTemplateInstantiation(getTemplateSpecializationKind());
}

}



Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some other interesting cases to test. If it is easy to get these to work, great. My main concern is ensuring a reasonable diagnostic is issued if someone tries things like this.

Use of CRTP. I would expect this case to work with no additional effort.

template<typename>
struct base_handler {
  template<typename KNT, typename... Ts>
  void sycl_enqueue_kernel_launch(const char*, Ts...) {}
};
struct derived_handler : base_handler<derived_handler> {
  template<typename KNT, typename KT>
  [[clang::sycl_kernel_entry_point(KNT)]]
  void entry(KT k) { k(); }
};

Use of CRTP with a dependent base class. I would expect this case to fail because, in ordinary C++, explicit qualification is required to access a member of a dependent base class.

template<typename>
struct base_handler {
  template<typename KNT, typename... Ts>
  void sycl_enqueue_kernel_launch(const char*, Ts...) {}
};
template<int>
struct derived_handler : base_handler<derived_handler<N>> {
  template<typename KNT, typename KT>
  [[clang::sycl_kernel_entry_point(KNT)]]
  void entry(KT k) { k(); }
};

Use of a variable template as the kernel launch function. I would be surprised if this worked without additional effort required. I can potentially see some interest in this working because it avoids ADL concerns.

template<typename KNT>
struct kernel_launcher {
  template<typename... Ts>
  void operator()(const char*, Ts...) const {}
};
template<typename KNT>
kernel_launcher<KNT> sycl_enqueue_kernel_launch;
struct handler {
  template<typename KNT, typename KT>
  [[clang::sycl_kernel_entry_point(KNT)]]
  void entry(KT k) { k(); }
};

Copy link
Author

@Fznamznon Fznamznon Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use of CRTP. I would expect this case to work with no additional effort.

Works.

Use of CRTP with a dependent base class. I would expect this case to fail because, in ordinary C++, explicit qualification is required to access a member of a dependent base class.

Doesn't work for that exact reason. I'll see what can be done on Monday.

Use of a variable template as the kernel launch function. I would be surprised if this worked without additional effort required. I can potentially see some interest in this working because it avoids ADL concerns.

Works.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't work for that exact reason. I'll see what can be done on Monday.

I think we should probably let this fail since we don't have a way for the programmer to express an intent to look in dependent base classes. Perhaps the diagnostic could be improved though. Or a warning issued if overload resolution could have selected it (if the call had explicit qualification) but found another candidate. I don't think much effort is warranted for this though.

That's cool that the variable template case just worked without additional effort!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the diagnostic could be improved though. Or a warning issued if overload resolution could have selected it (if the call had explicit qualification) but found another candidate. I don't think much effort is warranted for this though.

Could you please be more specific?
Right now for the following test

template<int, int = 0> struct KN;
template<typename>
struct base_handler {
  template<typename KNT, typename... Ts>
  void sycl_enqueue_kernel_launch(const char*, Ts...) {}
};

template<int N>
struct derived_handler_t : base_handler<derived_handler_t<N>> {
  template<typename KNT, typename KT>

  [[clang::sycl_kernel_entry_point(KNT)]]
  void entry(KT k) { k(); }
};

void bar() {
    derived_handler_t<13> H6;
  H6.entry<KN<13>>([](){});
}

the diagnostics emitted are

$ clang -cc1 -fsycl-is-host t39.cpp                 
t39.cpp:14:18: error: unable to find suitable 'sycl_enqueue_kernel_launch' function for host code synthesis
   14 |   void entry(KT k) { k(); }                                                                        
      |                  ^                                                                                 
t39.cpp:14:18: note: define 'sycl_enqueue_kernel_launch' function template to fix this problem             
1 error generated.                                                                                         

What else could we potentially say?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do something like below. I don't know if this would be difficult or not.

$ clang -cc1 -fsycl-is-host t39.cpp                 
t39.cpp:14:18: error: unable to find suitable 'sycl_enqueue_kernel_launch' function for host code synthesis
   14 |   void entry(KT k) { k(); }                                                                        
      |                  ^                                                                                 
t39.cpp:5:8: note: explicit qualification required to use member 'sycl_enqueue_kernel_launch' from dependent base class; define 'sycl_enqueue_kernel_launch' in the derived class and forward arguments to the base class member.
1 error generated.       

From reviewing other diagnostics, I see that lookup in dependent base classes might work as a Microsoft extension. Does that case work if -fms-extensions is passed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that case work if -fms-extensions is passed?

No

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, the extension isn't tied to -fms-extensions or -fms-compatibility, but rather to the target. See https://godbolt.org/z/qeEGh7qaG. I suspect the test will pass if the target is set to x86_64-windows-msvc. MSVC accepts the code in the link in C++17 or earlier modes or when proper two phase lookup is disabled with /ZctwoPhase- for C++20 or later.

Copy link
Author

@Fznamznon Fznamznon Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tahonermann , do we want to accept this or do we want to diagnose this? It seems it is accepted for MSVC due to additional hacks somewhere. I see that for normal C++ path and MSVC we actually come to the CallExpr generation during template instantiation with an empty LookupResult object and built some kind of "recovery" CallExpr.

It seems both diagnostics - the error and the msvc extension warning are emitted by Sema::DiagnoseEmptyLookup which then calls DiagnoseDependentMemberLookup. What I don't like here is that it walks DeclContexts and "emulates" the lookup. I definitely don't want to write that. I could adapt Sema::DiagnoseEmptyLookup for SYCL but that would need additional flags to control its behavior for that special SYCL case which I'm not sure is worth it.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not inclined to do anything special for this case. How about we add RUN lines that exercise the x86_64-windows-msvc target and conditionalize any differences in expected diagnostics accordingly? I would expect only the dependent CRTP case to be affected.

My earlier investigation of the MSVC compatibility was incomplete. I remember being frustrated at the time that I couldn't get the ext_found_in_dependent_base diagnostic to be issued. It seems that diagnostic is only issued if the derived class template is instantiated and my test didn't do that. https://godbolt.org/z/E1ToqrdKb demonstrates the extension warning and shows that, contrary to my expectations, MSVC does actually reject such use (in C++20 mode; it accepts it in C++17 mode; clang doesn't differentiate those cases which is fine). Your test already does this, so once additional RUN lines for Windows are added, that diagnostic should be captured by the test. I think this suffices to address this corner case. For anyone that wants this to work, they can add an inline sycl_kernel_launch() member to the derived class template that forwards to the dependent base class.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the Windows RUN line.

Co-authored-by: Tom Honermann <tom@honermann.net>
Copy link
Owner

@tahonermann tahonermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple more minor comments. I'm inclined to merge this once we finish with these.

Copy link
Owner

@tahonermann tahonermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Fznamznon, merging...

@tahonermann tahonermann merged commit dd8f6cf into tahonermann:sycl-upstream-fe-sycl_kernel_entry_point-host Sep 29, 2025
Copy link

@aelovikov-intel aelovikov-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs a doc/<something> that would provide the snippet of the code generated by the FE, unless destination branch already has that.

@tahonermann
Copy link
Owner

I think this needs a doc/<something> that would provide the snippet of the code generated by the FE, unless destination branch already has that.

@aelovikov-intel, the destination branch has that documentation here. This includes some of my own changes as well as Mariya's; see the full PR here. Feel free to review and comment on that PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants