-
Notifications
You must be signed in to change notification settings - Fork 768
[SYCL] Mark kernel on host for enabling optimizations. #2186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
TODO: When PR #1785 is merged, the tests need to be modified to use the new const reference syntax. |
clang/lib/Sema/SemaDeclAttr.cpp
Outdated
// Function must return void. | ||
if (!getFunctionOrMethodResultType(D)->isVoidType()) { | ||
S.Diag(FT->getLocation(), diag::warn_sycl_attr_return_type) | ||
<< "sycl_kernel_impl"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need additional attribute?
sycl_kernel_impl
looks fully identical to sycl_kernel
attribute. Can we just use sycl_kernel
attribute instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main issue is the restriction on the number of template arguments and the number of function arguments on a routine marked "sycl_kernel". The implementation routines that I have seen seem to need a bit more leeway than that. Can that restriction on the "sycl_kernel" routines be lifted? If so, we can reuse the same attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant "parameters" in both places that I said "arguments" above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those restrictions are not coming from the spec, they aim to guarantee that the right function is marked with this attribute, so it can be used to construct OpenCL kernel.
We can lift some of them if we are sure that OpenCL kernel generation won't be broken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this information, @Fznamznon! Since OpenCL kernel generation happens on the device, I will try to reuse the same attribute on the host for this PR purpose.
S.ConstructOpenCLKernel(FD, MC); | ||
} else if (S.LangOpts.SYCLIsHost) { | ||
auto CRD = (*FD->param_begin())->getType()->getAsCXXRecordDecl(); | ||
for (auto Method : CRD->methods()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for (auto Method : CRD->methods()) | |
for (auto *Method : CRD->methods()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// Because all SYCL kernel functions are template functions - they | ||
// have deferred instantination. We need bodies of these functions | ||
// so we are checking for SYCL kernel attribute after instantiation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment became out of context here. Lets remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -0,0 +1,37 @@ | |||
// RUN: %clang_cc1 -fsycl -fsycl-is-host -fsyntax-only -verify %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we merge this test with existing test/SemaSYCL/kernel-attribute.cpp
test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Please review.
@@ -891,6 +891,12 @@ void CodeGenFunction::StartFunction(GlobalDecl GD, QualType RetTy, | |||
if (D && D->hasAttr<CFICanonicalJumpTableAttr>()) | |||
Fn->addFnAttr("cfi-canonical-jump-table"); | |||
|
|||
if (getLangOpts().SYCLIsHost && D && D->hasAttr<SYCLKernelAttr>()) { | |||
Fn->addFnAttr("sycl_kernel"); | |||
// Sycl kernel functions are not subject to inlining |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Sycl kernel functions are not subject to inlining | |
// SYCL kernel functions are not subject to inlining. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// Test that the kernel implementation routine marked with 'sycl_kernel' has the attribute `sycl_kernel` in the generated LLVM IR | ||
// and that the function object passed to the sycl kernel is marked 'alwaysinline' on the host |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please keep these lines short? i.e. 80 characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, sorry I missed this.
if (S.LangOpts.SYCLIsDevice) { | ||
S.ConstructOpenCLKernel(FD, MC); | ||
} else if (S.LangOpts.SYCLIsHost) { | ||
auto CRD = (*FD->param_begin())->getType()->getAsCXXRecordDecl(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto CRD = (*FD->param_begin())->getType()->getAsCXXRecordDecl(); | |
auto *CRD = (*FD->param_begin())->getType()->getAsCXXRecordDecl(); |
However, I would prefer explicit type here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used explicit type here.
If the goal is to have |
The 'alwaysinline' attribute is applied on the call operator of the user-provided lambda or function object. I am not sure how that can be marked in the header file. If I have misunderstood you, please explain. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit, otherwise ok.
clang/lib/Sema/SemaDeclAttr.cpp
Outdated
// Function template must have at least two template parameters to be | ||
// able to generate OpenCL SYCL kernels. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Function template must have at least two template parameters to be | |
// able to generate OpenCL SYCL kernels. | |
// Function template must have at least two template parameters so it can be used | |
// in OpenCL kernel generation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
To enable better optimizations on the host, kernel implementation routines may be marked with the attribute [[clang::sycl_kernel_impl]]. The function object passed to the SYCL kernel is also marked with alwaysinline on the host to enable these optimizations to be more effective. Signed-off-by: Premanand M Rao <premanand.m.rao@intel.com>
Signed-off-by: Premanand M Rao <premanand.m.rao@intel.com>
Signed-off-by: Premanand M Rao <premanand.m.rao@intel.com>
Signed-off-by: Premanand M Rao <premanand.m.rao@intel.com>
Signed-off-by: Premanand M Rao <premanand.m.rao@intel.com>
2eb8142
to
690af65
Compare
Sorry for the force-push, there were some unrelated test failures which needed changes from other commits. |
Tests still failing. |
Yes, not sure why though. They were unrelated to my changes and a restart shows a clean run. |
@bader , would it be possible to merge this soon? I'd like this to get into github today if feasible. Thanks! |
This is essentially a revert of PR intel#2186, as the supposed optimization did not materialize. I have removed the functionality added in that PR, while keeping the code cleanups.
This is essentially a revert of PR intel#2186, as the supposed optimization did not materialize. I have removed the functionality added in that PR, while keeping the code cleanups.
This is essentially a revert of PR #2186, as the supposed optimization did not materialize. I have removed the functionality added in that PR, while keeping the code cleanups.
To enable better optimizations on the host, kernel implementation
routines may be marked with the attribute [[clang::sycl_kernel]].
Previously the 'sycl_kernel' attribute was ignored on host.
The call operator of the function object passed to the SYCL kernel
is also marked with 'alwaysinline' on the host to enable these optimizations
to be more effective.
Signed-off-by: Premanand M Rao premanand.m.rao@intel.com